**Hana Chockler Georg Weissenbacher (Eds.)**

# LNCS 10982

# **Computer Aided Verification**

**30th International Conference, CAV 2018 Held as Part of the Federated Logic Conference, FloC 2018 Oxford, UK, July 14–17, 2018, Proceedings, Part II**

# Lecture Notes in Computer Science 10982

### Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

# Editorial Board

David Hutchison Lancaster University, Lancaster, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Zurich, Switzerland John C. Mitchell Stanford University, Stanford, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel C. Pandu Rangan Indian Institute of Technology Madras, Chennai, India Bernhard Steffen TU Dortmund University, Dortmund, Germany Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbrücken, Germany

More information about this series at http://www.springer.com/series/7407

Hana Chockler • Georg Weissenbacher (Eds.)

# Computer Aided Verification

30th International Conference, CAV 2018 Held as Part of the Federated Logic Conference, FloC 2018 Oxford, UK, July 14–17, 2018 Proceedings, Part II

Editors Hana Chockler King's College London UK

Georg Weissenbacher TU Wien Vienna Austria

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-319-96141-5 ISBN 978-3-319-96142-2 (eBook) https://doi.org/10.1007/978-3-319-96142-2

Library of Congress Control Number: 2018948145

LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

© The Editor(s) (if applicable) and The Author(s) 2018. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

# Preface

It was our privilege to serve as the program chairs for CAV 2018, the 30th International Conference on Computer-Aided Verification. CAV is an annual conference dedicated to the advancement of the theory and practice of computer-aided formal analysis methods for hardware and software systems. CAV 2018 was held in Oxford, UK, July 14–17, 2018, with the tutorials day on July 13.

This year, CAV was held as part of the Federated Logic Conference (FLoC) event and was collocated with many other conferences in logic. The primary focus of CAV is to spur advances in hardware and software verification while expanding to new domains such as learning, autonomous systems, and computer security. CAV is at the cutting edge of research in formal methods, as reflected in this year's program.

CAV 2018 covered a wide spectrum of subjects, from theoretical results to concrete applications, including papers on application of formal methods in large-scale industrial settings. It has always been one of the primary interests of CAV to include papers that describe practical verification tools and solutions and techniques that ensure a high practical appeal of the results. The proceedings of the conference are published in Springer's Lecture Notes in Computer Science series. A selection of papers were invited to a special issue of Formal Methods in System Design and the Journal of the ACM.

This is the first year that the CAV proceedings are published under an Open Access license, thus giving access to CAV proceedings to a broad audience. We hope that this decision will increase the scope of practical applications of formal methods and will attract even more interest from industry.

CAV received a very high number of submissions this year—215 overall—resulting in a highly competitive selection process. We accepted 13 tool papers and 52 regular papers, which amounts to an acceptance rate of roughly 30% (for both regular papers and tool papers). The high number of excellent submissions in combination with the scheduling constraints of FLoC forced us to reduce the length of the talks to 15 minutes, giving equal exposure and weight to regular papers and tool papers.

The accepted papers cover a wide range of topics and techniques, from algorithmic and logical foundations of verification to practical applications in distributed, networked, cyber-physical, and autonomous systems. Other notable topics are synthesis, learning, security, and concurrency in the context of formal methods. The proceedings are organized according to the sessions in the conference.

The program featured two invited talks by Eran Yahav (Technion), on using deep learning for programming, and by Somesh Jha (University of Wisconsin Madison) on adversarial deep learning. The invited talks this year reflect the growing interest of the CAV community in deep learning and its connection to formal methods. The tutorial day of CAV featured two invited tutorials, by Shaz Qadeer on verification of concurrent programs and by Matteo Maffei on static analysis of smart contracts. The subjects of the tutorials reflect the increasing volume of research on verification of concurrent software and, as of recently, the question of correctness of smart contracts. As every year, one of the winners of the CAV award also contributed a presentation. The tutorial day featured a workshop in memoriam of Mike Gordon, titled "Three Research Vignettes in Memory of Mike Gordon," organized by Tom Melham and jointly supported by CAV and ITP communities.

Moreover, we continued the tradition of organizing a LogicLounge. Initiated by the late Helmut Veith at the Vienna Summer of Logic 2014, the LogicLounge is a series of discussions on computer science topics targeting a general audience and has become a regular highlight at CAV. This year's LogicLounge took place at the Oxford Union and was on the topic of "Ethics and Morality of Robotics," moderated by Judy Wajcman and featuring a panel of experts on the topic: Luciano Floridi, Ben Kuipers, Francesca Rossi, Matthias Scheutz, Sandra Wachter, and Jeannette Wing. We thank May Chan, Katherine Fletcher, and Marta Kwiatkowska for organizing this event, and the Vienna Center of Logic and Algorithms for their support.

In addition, CAV attendees enjoyed a number of FLoC plenary talks and events targeting the broad FLoC community.

In addition to the main conference, CAV hosted the Verification Mentoring Workshop for junior scientists entering the field and a high number of pre- and post-conference technical workshops: the Workshop on Formal Reasoning in Distributed Algorithms (FRIDA), the workshop on Runtime Verification for Rigorous Systems Engineering (RV4RISE), the 5th Workshop on Horn Clauses for Verification and Synthesis (HCVS), the 7th Workshop on Synthesis (SYNT), the First International Workshop on Parallel Logical Reasoning (PLR), the 10th Working Conference on Verified Software: Theories, Tools and Experiments (VSTTE), the Workshop on Machine Learning for Programming (MLP), the 11th International Workshop on Numerical Software Verification (NSV), the Workshop on Verification of Engineered Molecular Devices and Programs (VEMDP), the Third Workshop on Fun With Formal Methods (FWFM), the Workshop on Robots, Morality, and Trust through the Verification Lens, and the IFAC Conference on Analysis and Design of Hybrid Systems (ADHS).

The Program Committee (PC) for CAV consisted of 80 members; we kept the number large to ensure each PC member would have a reasonable number of papers to review and be able to provide thorough reviews. As the review process for CAV is double-blind, we kept the number of external reviewers to a minimum, to avoid accidental disclosures and conflicts of interest. Altogether, the reviewers drafted over 860 reviews and made an enormous effort to ensure a high-quality program. Following the tradition of CAV in recent years, the artifact evaluation was mandatory for tool submissions and optional but encouraged for regular submissions. We used an Artifact Evaluation Committee of 25 members. Our goal for artifact evaluation was to provide friendly "beta-testing" to tool developers; we recognize that developing a stable tool on a cutting-edge research topic is certainly not easy and we hope the constructive comments provided by the Artifact Evaluation Committee (AEC) were of help to the developers. As a result of the evaluation, the AEC accepted 25 of 31 artifacts accompanying regular papers; moreover, all 13 accepted tool papers passed the evaluation. We are grateful to the reviewers for their outstanding efforts in making sure each paper was fairly assessed. We would like to thank our artifact evaluation chair, Igor Konnov, and the AEC for evaluating all artifacts submitted with tool papers as well as optional artifacts submitted with regular papers.

Of course, without the tremendous effort put into the review process by our PC members this conference would not have been possible. We would like to thank the PC members for their effort and thorough reviews.

We would like to thank the FLoC chairs, Moshe Vardi, Daniel Kroening, and Marta Kwiatkowska, for the support provided, Thanh Hai Tran for maintaining the CAV website, and the always helpful Steering Committee members Orna Grumberg, Aarti Gupta, Daniel Kroening, and Kenneth McMillan. Finally, we would like to thank the team at the University of Oxford, who took care of the administration and organization of FLoC, thus making our jobs as CAV chairs much easier.

July 2018 Hana Chockler Georg Weissenbacher

# Organization

### Program Committee

Christel Baier TU Dresden, Germany Clark Barrett Stanford University, USA Ezio Bartocci TU Wien, Austria Dirk Beyer LMU Munich, Germany Per Bjesse Synopsys Inc., USA Swarat Chaudhuri Rice University, USA Vijay D'Silva Google, USA Cezara Dragoi Inria Paris, ENS, France Kerstin Eder University of Bristol, UK Michael Emmi Nokia Bell Labs, USA Gerard Holzmann Nimble Research, USA Franjo Ivancic Google, USA

Alexander Ivrii IBM, Israel Himanshu Jain Synopsys, USA

Aws Albarghouthi University of Wisconsin-Madison, USA Jasmin Christian Blanchette Vrije Universiteit Amsterdam, Netherlands Roderick Bloem Graz University of Technology, Austria Ahmed Bouajjani IRIF, University Paris Diderot, France Pavol Cerny University of Colorado Boulder, USA Rohit Chadha University of Missouri, USA Wei-Ngan Chin National University of Singapore, Singapore Hana Chockler King's College London, UK Alessandro Cimatti Fondazione Bruno Kessler, Italy Loris D'Antoni University of Wisconsin-Madison, USA Cristina David University of Cambridge, UK Jyotirmoy Deshmukh University of Southern California, USA Isil Dillig The University of Texas at Austin, USA Georgios Fainekos Arizona State University, USA Dana Fisman University of Pennsylvania, USA Vijay Ganesh University of Waterloo, Canada Sicun Gao University of California San Diego, USA Alberto Griggio Fondazione Bruno Kessler, Italy Orna Grumberg Technion - Israel Institute of Technology, Israel Arie Gurfinkel University of Waterloo, Canada William Harrison Department of CS, University of Missouri, Columbia, USA Alan J. Hu The University of British Columbia, Canada Somesh Jha University of Wisconsin-Madison, USA

Stefan Kiefer University of Oxford, UK Laura Kovacs TU Wien, Austria Orna Kupferman Hebrew University, Israel Shuvendu Lahiri Microsoft, USA Rupak Majumdar MPI-SWS, Germany Ken McMillan Microsoft, USA Alexander Nadel Intel, Israel Mayur Naik Intel, USA Kedar Namjoshi Nokia Bell Labs, USA Shaz Qadeer Microsoft, USA Arjun Radhakrishna Microsoft, USA Roopsha Samanta Purdue University, USA Anna Slobodova Centaur Technology, USA Armando Solar-Lezama MIT, USA Ofer Strichman Technion, Israel Caterina Urban ETH Zurich, Switzerland Yakir Vizel Technion, Israel Bow-Yaw Wang Academia Sinica, Taiwan Georg Weissenbacher TU Wien, Austria Damien Zufferey MPI-SWS, Germany Florian Zuleger TU Wien, Austria

Susmit Jha SRI International, USA Ranjit Jhala University of California San Diego, USA Barbara Jobstmann EPFL and Cadence Design Systems, Switzerland Zachary Kincaid Princeton University, USA Viktor Kuncak Ecole Polytechnique Fédérale de Lausanne, Switzerland Dejan Nickovic Austrian Institute of Technology AIT, Austria Corina Pasareanu CMU/NASA Ames Research Center, USA Nir Piterman University of Leicester, UK Pavithra Prabhakar Kansas State University, USA Mitra Purandare IBM Research Laboratory Zurich, Switzerland Noam Rinetzky Tel Aviv University, Israel Philipp Ruemmer Uppsala University, Sweden Sriram Sankaranarayanan University of Colorado, Boulder, USA Martina Seidl Johannes Kepler University Linz, Austria Koushik Sen University of California, Berkeley, USA Sanjit A. Seshia University of California, Berkeley, USA Natasha Sharygina Università della Svizzera Italiana, Lugano, Switzerland Sharon Shoham Tel Aviv University, Israel Serdar Tasiran Amazon Web Services, USA Tomas Vojnar Brno University of Technology, Czechia Thomas Wahl Northeastern University, USA Thomas Wies New York University, USA Karen Yorav IBM Research Laboratory Haifa, Israel Lenore Zuck University of Illinois in Chicago, USA

# Artifact Evaluation Committee


# Additional Reviewers

Cohen, Ernie Costea, Andreea Dangl, Matthias Doko, Marko Drachsler Cohen, Dana Dreossi, Tommaso Dutra, Rafael Ebrahimi, Masoud Eisner, Cindy Fedyukovich, Grigory Fremont, Daniel Freund, Stephen

Friedberger, Karlheinz Ghorbani, Soudeh Ghosh, Shromona Goel, Shilpi Gong, Liang Govind, Hari Gu, Yijia Habermehl, Peter Hamza, Jad He, Paul Heo, Kihong Holik, Lukas

Humenberger, Andreas Hyvärinen, Antti Hölzl, Johannes Iusupov, Rinat Jacobs, Swen Jain, Mitesh Jaroschek, Maximilian Jha, Sumit Kumar Keidar-Barner, Sharon Khalimov, Ayrat Kiesl, Benjamin Koenighofer, Bettina Krstic, Srdjan Laeufer, Kevin Lee, Woosuk Lemberger, Thomas Lemieux, Caroline Lewis, Robert Liang, Jia Liang, Jimmy Liu, Peizun Lång, Magnus

Maffei, Matteo Marescotti, Matteo Mathur, Umang Miné, Antoine Mora, Federico Nevo, Ziv Ochoa, Martin Orni, Avigail Ouaknine, Joel Padhye, Rohan Padon, Oded Partush, Nimrod Pavlinovic, Zvonimir Pavlogiannis, Andreas Peled, Doron Pendharkar, Ishan Peng, Yan Petri, Gustavo Polozov, Oleksandr Popescu, Andrei Potomkin, Kostiantyn Raghothaman, Mukund Reynolds, Andrew Reynolds, Thomas Ritirc, Daniela Rogalewicz, Adam Scott, Joe Shacham, Ohad Song, Yahui Sosnovich, Adi Sousa, Marcelo Subramanian, Kausik Sumners, Rob Swords, Sol Ta, Quang Trung Tautschnig, Michael Traytel, Dmitriy Trivedi, Ashutosh Udupa, Abhishek van Dijk, Tom Wendler, Philipp Zdancewic, Steve Zulkoski, Ed

# Contents – Part II

### Tools



and Serdar Tasiran


# Contents – Part I

### Invited Papers


### Program Analysis Using Polyhedra

George Argyros and Loris D'Antoni


### Runtime Verification, Hybrid and Timed Systems


### Probabilistic Systems


# Tools

# **Let this Graph Be Your Witness! An Attestor for Verifying Java Pointer Programs**

Hannah Arndt, Christina Jansen, Joost-Pieter Katoen , Christoph Matheja(B) , and Thomas Noll

> Software Modeling and Verification Group, RWTH Aachen University, Aachen, Germany matheja@cs.rwth-aachen.de

**Abstract.** We present a graph-based tool for analysing Java programs operating on dynamic data structures. It involves the generation of an abstract state space employing a user-defined graph grammar. LTL model checking is then applied to this state space, supporting both structural and functional correctness properties. The analysis is fully automated, procedure-modular, and provides informative visual feedback including counterexamples in the case of property violations.

# **1 Introduction**

Pointers constitute an essential concept in modern programming languages, and are used for implementing dynamic data structures like lists, trees etc. However, many software bugs can be traced back to the erroneous use of pointers by e.g. dereferencing null pointers or accidentally pointing to wrong parts of the heap. Due to the resulting unbounded state spaces, pointer errors are hard to detect. Automated tool support for validation of pointer programs that provides meaningful debugging information in case of violations is therefore highly desirable.

Attestor is a verification tool that attempts to achieve both of these goals. To this aim, it first constructs an abstract state space of the input program by means of symbolic execution. Each state depicts both links between heap objects and values of program variables using a graph representation. Abstraction is performed on state level by means of graph grammars. They specify the data structures maintained by the program, and describe how to summarise substructures of the heap in order to obtain a finite representation. After labelling each state with propositions that provide information about structural properties such as reachability or heap shapes, the actual verification task is performed in a second step. To this aim, the abstract state space is checked against a user-defined LTL specification. In case of violations, a counterexample is provided.

H. Arndt and C. Matheja—Supported by Deutsche Forschungsgemeinschaft (DFG) Grant No. 401/2-1.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 3–11, 2018. https://doi.org/10.1007/978-3-319-96142-2\_1

In summary, Attestor's main features can be characterized as follows:


*Availability.* Attestor's source code, benchmarks, and documentation are available online at https://moves-rwth.github.io/attestor.

# **2 The Attestor Tool**

Attestor is implemented in Java and consists of about 20.000 LOC (excluding comments and tests). An architectural overview is depicted in Fig. 1. It shows the tool inputs (left), its outputs (right), the Attestor backend with its processing phases (middle), the Attestor frontend (below) as well as the API connecting back- and frontend. These elements are discussed in detail below.

# **2.1 Input**

As shown in Fig. 1 (left), a verification task is given by four inputs. First, the program to be analysed. Here, Java as well as Java Bytecode programs with possibly recursive procedures are supported, where the former is translated to the latter prior to the analysis. Second, the specification has to be given by a set of LTL formulae enriched with heap-specific propositions. See Sect. 3 for a representative list of exemplary specifications.

As a third input, Attestor expects the declaration of the graph grammar that guides the abstraction. In order to obtain a finite abstract state space, this grammar is supposed to cover the data structures emerging during program

**Fig. 1.** The Attestor tool

execution. The user may choose from a set of grammar definitions for standard data structures such as singly- and doubly-linked lists and binary trees, the manual specification in a JSON-style graph format and combinations thereof.

Fourth, additional options can be given that e.g. define the initial heap configuration(s) (in JSON-style graph format), that control the granularity of abstraction and the garbage collection behaviour, or that allow to re-use results of previous analyses in the form of procedure contracts [11,13].

### **2.2 Phases**

Attestor proceeds in six main phases, see Fig. <sup>1</sup> (middle). In the first and third phase, all inputs are parsed and preprocessed. The input program is read and transformed to Bytecode (if necessary), the input graphs (initial configuration, procedure contracts, and graph grammar), LTL formulae and further options are read.

Depending on the provided LTL formulae, additional markings are inserted into the initial heap (see [8] for details) in the second phase. They are used to track identities of objects during program execution, which is later required to validate visit and neighbourhood properties during the fifth phase.

In the next phase the actual program analysis is conducted. To this aim, Attestor first constructs the abstract state space as described in Sect. 2.3 in detail. In the fifth phase we check whether the provided LTL specification holds on the state space resulting from the preceding step. We use an off-the-shelf tableau-based LTL model checking algorithm [2].

If desired, during all phases results are forwarded to the API to make them accessible to the frontend or the user directly. We address this output in Sect. 2.4.

### **2.3 Abstract State Space Generation**

The core module of Attestor is the abstract state space generation. It employs an abstraction approach based on hyperedge replacement grammars, whose theoretical underpinnings are described in [9] in detail. It is centred around a graphbased representation of the heap that contains concrete parts side by side with placeholders representing a set of heap fragments of a certain shape. The state space generation loop as implemented in Attestor is shown in Fig. 2.

Initially it is provided with the initial program state(s), that is, the program counter corresponding to the starting statement together with the initial heap configuration(s). From these, Attestor picks a state at random and applies the abstract semantics of the next statement: First, the heap configuration is locally concretised ensuring that all heap parts required for the statement to execute are accessible. This is enabled by applying rules of the input graph grammar in forward direction, which can entail branching in the state space. The resulting configurations are then manipulated according to

**Fig. 2.** State space generation.

the concrete semantics of the statement. At this stage, Attestor automatically detects possible null pointer dereferencing operations as a byproduct of the state space generation. In a subsequent rectification step, the heap configuration is cleared from e.g. dead variables and garbage (if desired). Consequently, memory leaks are detected immediately. The rectified configuration is then abstracted with respect to the data structures specified by means of the input graph grammar. Complementary to concretisation, this is realised by applying grammar rules in backward direction, which involves a check for embeddings of righthand sides. A particular strength of our approach is its robustness against local violations of data structures, as it simply leaves the corresponding heap parts concrete. Finalising the abstract execution step, the resulting state is labelled with the atomic propositions it satisfies. This check is efficiently implemented by means of heap automata (see [12,15] for details). By performing a subsumption check on the state level, Attestor detects whether the newly generated state is already covered by a more abstract one that has been visited before. If not, it

**Fig. 3.** Screenshot of Attestor's frontend for state space exploration. (Color figure online)

adds the resulting state to the state space and starts over by picking a new state. Otherwise, it checks whether further states have to be processed or whether a fixpoint in the state space generation is reached. In the latter case, this phase is terminated.

### **2.4 Output**

As shown in Fig. 1 (right), we obtain three main outputs once the analysis is completed: the computed abstract state space, the derived procedure contracts, and the model checking results. For each LTL formula in the specification, results comprise the possible answers "formula satisfied", "formula (definitely) not satisfied", or "formula possibly not satisfied". In case of the latter two, Attestor additionally produces a counterexample, i.e. an abstract trace that violates the formula. If Attestor was able to verify the non-spuriousness of this counterexample (second case), we are additionally given a concrete initial heap that is accountable for the violation and that can be used as a test case for debugging.

Besides the main outputs, Attestor provides general information about the current analysis. These include log messages such as warnings and errors, but also details about settings and runtimes of the analyses. The API provides the interface to retrieve Attestor's outputs as JSON-formatted data.

### **2.5 Frontend**

Attestor features a graphical frontend that visualises inputs as well as results of all benchmark runs. The frontend communicates with Attestor's backend via the API only. It especially can be used to display and navigate through the generated abstract state space and counterexample traces.

A screenshot of the frontend for state space exploration is found in Fig. 3. The left panel is an excerpt of the state space. The right panel depicts the currently selected state, where red boxes correspond to variables and constants, circles correspond to allocated objects/locations, and yellow boxes correspond to nonterminals of the employed graph grammar, respectively. Arrows between two circles represent pointers. Further information about the selected state is provided in the topmost panel. Graphs are rendered using cytoscape.js [6].

# **3 Evaluation**

*Tool Comparison.* While there exists a plethora of tools for analysing pointer programs, such as, amongst others, Forester [10], Groove [7], Infer [5], Hip/Sleek [17], Korat [16], Juggrnaut [9], and Tvla [3], these tools differ in multiple dimensions:


*Benchmarks.* Due to the above mentioned diversity there is no publicly available and representative set of standardised benchmarks to compare the aforementioned tools [1]. We thus evaluated Attestor on a collection of challenging, pointer intensive algorithms compiled from the literature [3,4,10,14]. To assess our counterexample generation, we considered invalid specifications, e.g. that a reversed list is the same list as the input list. Furthermore, we injected faults into our examples by swapping and deleting statements.

*Properties.* During state space generation, *memory safety (M)* is checked. Moreover, we consider five classes of properties that are verified using the built-in LTL model checker:


**Table 1.** The experimental results. All runtimes are in seconds. Verification time includes state space generation. SLL (DLL) means singly-linked (doubly-linked) list.


*Setup.* For performance evaluation, we conducted experiments on an Intel Core i7-7500U CPU @ 2.70 GHz with the Java virtual machine (OpenJDK version 1.8.0 151) limited to its default setting of 2 GB of RAM. All experiments were run using the Java benchmarking harness jmh. Our experimental results are shown in Table 1. Additionally, for comparison purpose we considered Java implementations of benchmarks that have been previously analysed for memory safety by Forester [10], see Table 2.

*Discussion.* The results show that both memory safety (M) and shape (S) are efficiently processed, with regard to both state space size and runtime. This is not surprising as these properties are directly handled by the state space generation engine. The most challenging tasks are the visit (V) and neighbourhood (N) properties as they require to track objects across program executions by means of markings. The latter have a similar impact as pointer variables: increasing their number impedes abstraction as larger parts of the heap have to be kept concrete. This effect can be observed for the Lindstrom tree traversal procedure

**Table 2.** Forester benchmarks (memory safety only). Verification times are in seconds.


where adding one marking (V) and three markings (N) both increase the verification effort by an order of magnitude.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **MaxSMT-Based Type Inference for Python 3**

Mostafa Hassan1,2, Caterina Urban2(B) , Marco Eilers<sup>2</sup> , and Peter M¨uller<sup>2</sup>

> <sup>1</sup> German University in Cairo, Cairo, Egypt <sup>2</sup> Department of Computer Science, ETH Zurich, Zurich, Switzerland caterina.urban@inf.ethz.ch

**Abstract.** We present Typpete, a sound type inferencer that automatically infers Python 3 type annotations. Typpete encodes type constraints as a MaxSMT problem and uses optional constraints and specific quantifier instantiation patterns to make the constraint solving process efficient. Our experimental evaluation shows that Typpete scales to real world Python programs and outperforms state-of-the-art tools.

# **1 Introduction**

Dynamically-typed languages like Python have become increasingly popular in the past five years. Dynamic typing enables rapid development and adaptation to changing requirements. On the other hand, static typing offers early error detection, efficient execution, and machine-checked code documentation, and enables more advanced static analysis and verification approaches [15].

For these reasons, Python's PEP484 [25] has recently introduced optional type annotations in the spirit of gradual typing [23]. The annotations can be checked using MyPy [10]. In this paper, we present our tool Typpete, which automatically infers sound (non-gradual) type annotations and can therefore serve as a preprocessor for other analysis or verification tools.

Typpete performs whole-program type inference, as there are no principal typings in object-oriented languages like Python [1, example in Sect. 1]; the inferred types are correct in the given context but may not be as general as possible. The type inference is constraint-based and relies on the off-the-shelf SMT solver Z3 [7] for finding a valid type assignment for the input program. We show that two main ingredients allow Typpete to scale to real programs: (1) a careful encoding of subtyping that leverages efficient quantifier instantiation techniques [6], and (2) the use of optional type equality constraints, which considerably reduce the solution search space. Whenever a valid type assignment for the input program cannot be found, Typpete encodes type error localization as an optimization problem [19] and reports only a minimal set of unfulfilled constraints to help the user pinpoint the cause of the error.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 12–19, 2018. https://doi.org/10.1007/978-3-319-96142-2\_2

**Fig. 1.** A Python implementation of the *odds and evens* hand game.

Typpete accepts programs written in (a large subset of) Python 3. Having a static type system imposes a number of requirements on Python programs: (a) a variable can only have a single type through the whole program; (b) generic types have to be homogeneous (e.g., all elements of a set must have the same type); and (c) dynamic code generation, reflection and dynamic attribute additions and deletions are not allowed. The supported type system includes generic classes and functions. Users must supply a file and the *number* of type variables for any generic class or function. Typpete then outputs a program with type annotations, a type error, or an error indicating use of unsupported language features.

Our experimental evaluation demonstrates the practical applicability of our approach. We show that Typpete performs well on a variety of real-world open source Python programs and outperforms state-of-the-art tools.

# **2 Constraint Generation**

Typpete encodes the type inference problem for a Python program into an SMT constraint resolution problem such that any solution of the SMT problem yields a valid type assignment for the program. The process of generating the SMT problem consists of three phases, which we describe below.

In a first pass over the input program, Typpete collects: (1) all globally defined names (to resolve forward references), (2) all classes and their respective subclass relations (to define subtyping), and (3) upper bounds on the size of certain types (e.g., tuples and function parameters). This pre-analysis encompasses both the input program—including all transitively imported modules—and *stub files*, which define the types of built-in classes and functions as well as libraries. Typpete already contains stubs for the most common built-ins; users can add custom stub files written in the format that is supported by MyPy.

In the second phase, Typpete declares an algebraic datatype Type, whose members correspond one-to-one to Python types. Typpete declares one datatype constructor for every class in the input program; non-generic classes are represented as constants, whereas a generic class with *n* type parameters is represented by a constructor taking *<sup>n</sup>* arguments of type Type. As an example, the class Odd in Fig. <sup>1</sup> is represented by the constant classOdd. Typpete also declares constructors for tuples and functions up to the maximum size determined in the pre-analysis, and for all type variables used in generic functions and classes.

The subtype relation *<sup>&</sup>lt;*: is represented by an uninterpreted function subtype which maps pairs of types to a boolean value. This function is delicate to define because of the possibility of *matching loops* (i.e., axioms being endlessly instantiated [7]) in the SMT solver. For each datatype constructor, Typpete generates axioms that explicitly enumerate the possible subtypes and supertypes. As an example, for the type classOdd, Typpete generates the following axioms:

<sup>∀</sup>*t.*subtype(classOdd*, t*)=(*<sup>t</sup>* <sup>=</sup> classOdd <sup>∨</sup> *<sup>t</sup>* <sup>=</sup> classItem <sup>∨</sup> *<sup>t</sup>* <sup>=</sup> classobject) <sup>∀</sup>*t.*subtype(*t,* classOdd)=(*<sup>t</sup>* <sup>=</sup> classnone <sup>∨</sup> *<sup>t</sup>* <sup>=</sup> classOdd)

Note that the second axiom allows None to be a subtype of any other type (as in Java). As we discuss in the next section, this definition of subtype allows us to avoid matching loops by specifying specific instantiation patterns for the SMT solver. A *substitution* function substitute, which substitutes type arguments for type variables when interacting with generic types, is defined in a similar way.

In the third step, Typpete traverses the program while creating an SMT variable for each node in its abstract syntax tree, and generating type constraints over these variables for the constructs in the program. During the traversal, a *context* maps all defined names (i.e., program variables, fields, etc.) to the corresponding SMT variables. The context is later used to retrieve the type assigned by the SMT solver to each name in the program. Constraints are generated for expressions (e.g., call arguments are subtypes of the corresponding parameter types), statements (e.g., the right-hand side of an assignment is a subtype of the left hand-side), and larger constructs such as methods (e.g., covariance and contravariance constraints for method overrides). For example, the (simplified) constraint generated for the call to item1.compete(item2) at line 21 in Fig. 1 contains a disjunction of cases depending on the type of the receiver:

(vitem1 <sup>=</sup> classOdd <sup>∧</sup> competeOdd <sup>=</sup> <sup>f</sup> <sup>2</sup>(classOdd*,* arg*,*ret) <sup>∧</sup> subtype(vitem2*,* arg))

<sup>∨</sup> (vitem1 <sup>=</sup> classEven <sup>∧</sup> competeEven <sup>=</sup> <sup>f</sup> <sup>2</sup>(classEven*,* arg*,*ret) <sup>∧</sup> subtype(vitem2*,* arg))

where <sup>f</sup> 2 is a datatype constructor for a function with two parameter types (and one return type ret), and <sup>v</sup>item1 and <sup>v</sup>item2 are the SMT variables corresponding to item1 and item2, respectively.

The generated constraints guarantee that any solution yields a correct type assignment for the input program. However, there are often many different valid solutions, as the constraints only impose lower or upper bounds on the types represented by the SMT variables (e.g., subtype(vitem2, arg) shown above imposes only an upper bound on the type of vitem2). This has an impact on performance (cf. Sect. 4) as the search space for a solution remains large. Moreover, some type assignments could be more desirable than others for a user (e.g., a user would most likely prefer to assign type int rather than object to a variable initialized with value zero). To avoid these problems, Typpete additionally generates optional type *equality* constraints in places where the mandatory constraints only demand subtyping (i.e., local variable assignments, return statements, passed function arguments), thereby turning the SMT problem into a MaxSMT optimization problem. For instance, in addition to subtype(vitem2, arg) shown above, Typpete generates the optional equality constraint <sup>v</sup>item2 <sup>=</sup> arg. The optional constraints guide the solver to try the specified exact type first, which is often a correct choice and therefore improves performance, and additionally leads to solutions with more precise variable and parameter types.

# **3 Constraint Solving**

Typpete relies on Z3 [7] and the MaxRes [18] algorithm for solving the generated type constraints. We use *e-matching* [6] for instantiating the quantifiers used in the axiomatization of the subtype function (cf. Sect. 2), and carefully choose instantiation patterns that ensure that any choice made during the search immediately triggers the instantiation of the relevant quantifiers. For instance, for the axioms shown in Sect. 2, we use the instantiation patterns subtype(classOdd, t) and subtype(t, classOdd), respectively. Our instantiation patterns ensure that as soon as one argument of an application of the subtype function is known, the quantifier that enumerates the possible values of the other argument is instantiated, thus ensuring that the consequences of any type choices propagate immediately. With a na¨ıve encoding, the solver would have to *guess* both arguments before being able to *check* whether the subtype relation holds. The resulting constraint solving process is much faster than it would be when using different quantifier instantiation techniques such as *model-based quantifier instantiation* [12], but still avoids the potential unsoundness that can occur when using e-matching with insufficient trigger expressions.

When the MaxSMT problem is satisfiable, Typpete queries Z3 for a model satisfying all type constraints, retrieves the types assigned to each name in the program, and generates type annotated source code for the input program. For instance, for the program shown in Fig. 1, Typpete automatically annotates the function evalEven with type Even for the parameter item and a str return type. Note that Item and object would also be correct type annotations for item; the choice of Even is guided by the optional type equality constraints.

When the MaxSMT problem is unsatisfiable, instead of reporting the unfulfilled constraints in the *unsatistiable core* returned by Z3 (which is not guaranteed to be minimal), Typpete creates a new *relaxed* MaxSMT problem where only the constraints defining the subtype function are enforced, while all other type constraints are optional. Z3 is then queried for a model satisfying as many type constraints as possible. The resulting type annotated source code for the input program is returned along with the remaining minimal set of unfulfilled type constraints. For instance, if we remove the abstract method compete of class Item in Fig. 1, Typpete annotates the parameters of the function match at line 20 with type object and indicates the call compete at line 21 as problematic. By observing the mismatch between the type annotations and the method call, the user has sufficient context to quickly identify and correct the type error.


**Fig. 2.** Evaluation of Typpete on small programs and larger open source projects.

# **4 Experimental Evaluation**

In order to demonstrate the practical applicability of our approach, we evaluated our tool Typpete on a number of real-world open-source Python programs that use inheritance, operator overloading, and other features that are challenging for type inference (but not features that make static typing impossible):


We additionally ran Typpete on our test suite of manually-written programs and small programs collected from the web (47 modules and 1998 LOC).

In order to make the projects statically typeable, we had to make a number of small changes that do not impact the functionality of the code, such as adding abstract superclasses and abstract methods, and (for the **imp** and **scion** projects) introducing explicit downcasts in few places. Additionally, we made a number of other innocuous changes to overcome the current limitations of our tool, such as replacing keyword arguments with positional arguments, replacing generator expressions with list comprehensions, and replacing super calls via inlining. The complete list of changes for each project is included in our artifact.

The experiments were conducted on an 2.9 GHz Intel Core i5 processor with <sup>8</sup> GB of RAM running Mac OS High Sierra version 10.13.3 with Z3 version 4.5.1. Figure 2 summarizes the result of the evaluation. The first two columns show the *average running time* (over ten runs, split into constraint generation and constraint solving) for the type inference in which the use of optional type equality constraints (cf. Sect. 2) is disabled (SMT) and enabled (MaxSMT), respectively. We can observe that optional type equality constraints (considerably) reduce the search space for a solution as disabling them significantly increases the running time for larger projects. We can also note that the constraint solving time improves significantly when the type inference is run on the test suite, which consists of many independent modules. This suggests that splitting the type inference problem into independent sub-problems could further improve performance. We plan to investigate this direction as part of our future work.

The third column of Fig. 2 shows the evaluation of the *error reporting* feature of Typpete (cf. Sect. 3). For each benchmark, we manually introduced two type errors that could organically happen during programming and compared the size of the unsatisfiable core (left of */*) and the number of remaining unfulfilled constraints (right of */*) for the original and relaxed MaxSMT problems given to Z3, respectively. We also list the times needed to prove the first problem unsatisfiable and solve the relaxed problem. As one would expect, the number of constraints that remain unfulfilled for the relaxed problems is considerably smaller, which demonstrates that the error reporting feature of Typpete greatly reduces the time that a user needs to identify the source of a type error.

Finally, the last column of Fig. <sup>2</sup> shows the result of the *comparison* of Typpete with the state-of-the-art type inferencer Pytype [16]. Pytype infers PEP484 [25] gradual type annotations by abstract interpretation [5] of the bytecode-compiled version of the given Python file. In Fig. 2, for the considered benchmarks, we report the number of variables and parameters that Pytype leaves untyped or annotated with Any. We excluded any module on which Pytype yields an error; in square brackets we indicate the number of modules that we could consider. Typpete is able to fully type all elements and thus outperforms Pytype *for static typing purposes*. On the other hand, we note that Pytype additionally supports gradual typing and a larger Python subset.

### **5 Related and Future Work**

In addition to Pytype, a number of other type inference approaches and tools have been developed for Python. The approach of Maia et al. [17] has some fundamental limitations such as not allowing forward references or overloaded functions and operators. Fritz and Hage [11] as well as Starkiller [22] infer sets of *concrete types* that can inhabit each program variable to improve execution performance. The former sacrifices soundness to handle more dynamic features of Python. Additionally, deriving valid type assignments from sets of concrete types is non-trivial. MyPy and a project by Cannon [3] can perform (incomplete) type inference for local variables, but require type annotations for function parameters and return types. PyAnnotate [13] *dynamically* tracks variable types during execution and optionally annotates Python programs; the resulting annotations are not guaranteed to be sound. A similar spectrum of solutions exists for other dynamic programming languages like JavaScript [2,14] and ActionScript [20].

The idea of using SMT solvers for type inference is not new. Both F\* [24] and LiquidHaskell [26] (partly) use SMT-solving in the inference for their dependent type systems. Pavlinovic et al. [19] present an SMT encoding of the OCaml type system. Typpete's approach to type error reporting can be seen as a simple instantiation of their approach.

As part of our future work, we want to explore whether our system can be adapted to infer gradual types. We also aim to develop heuristics for inferring which functions and classes should be annotated with generic types based on the reported unfulfilled constraints. Finally, we plan to explore the idea of splitting the type inference into multiple separate problems to improve performance.

**Acknowledgments.** We thank the anonymous reviewers for their feedback. This work was supported by an ETH Zurich Career Seed Grant (SEED-32 16-2).

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **The JKIND Model Checker**

Andrew Gacek1(B) , John Backes<sup>1</sup>, Mike Whalen<sup>2</sup>, Lucas Wagner<sup>1</sup>, and Elaheh Ghassabani<sup>2</sup>

> <sup>1</sup> Rockwell Collins, Cedar Rapids, USA andrew.gacek@gmail.com, john.backes@gmail.com, lucas.wagner@rockwellcollins.com <sup>2</sup> University of Minnesota, Minneapolis, USA *{*mwwhalen,ghass013*}*@umn.edu

**Abstract.** JKind is an open-source industrial model checker developed by Rockwell Collins and the University of Minnesota. JKind uses multiple parallel engines to prove or falsify safety properties of infinite state models. It is portable, easy to install, performance competitive with other state-of-the-art model checkers, and has features designed to improve the results presented to users: *inductive validity cores* for proofs and *counterexample smoothing* for test-case generation. It serves as the back-end for various industrial applications.

# **1 Introduction**

JKind is an open-source<sup>1</sup> industrial infinite-state inductive model checker for safety properties. Models and properties in JKind are specified in Lustre [17], a synchronous data-flow language, using the theories of linear real and integer arithmetic. JKind uses SMT-solvers to prove and falsify multiple properties in parallel. A distinguishing characteristic of JKind is its focus on the usability of results. For a proven property, JKind provides traceability between the property and individual model elements. For a falsified property, JKind provides options for simplifying the counterexample in order to highlight the root cause of the failure. In industrial applications, we have found these additional usability aspects to be at least as important as the primary results. Another important characteristic of JKind is that is it designed to be integrated directly into userfacing applications. Written in Java, JKind runs on all major platforms and is easily compiled into other Java applications. JKind bundles the Java-based SMTInterpol solver and has no external dependencies. However, it can optionally call Z3, Yices 1, Yices 2, CVC4, and MathSAT if they are available.

# **2 Functionality and Main Features**

JKind is structured as several parallel engines that coordinate to prove properties, mimicking the design of PKind and Kind 2 [8,21]. Some engines are

<sup>1</sup> https://github.com/agacek/jkind.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 20–27, 2018. https://doi.org/10.1007/978-3-319-96142-2\_3

**Fig. 1.** JKind engine architecture

directly responsible for proving properties, others aid that effort by generating invariants, and still others are reserved for post-processing of proof or counterexample results. Each engine can be enabled or disabled separately based on the user's needs. The architecture of JKind allows any engine to broadcast information to the other engines (for example, lemmas, proofs, counterexamples) allowing straightforward integration of new functionality.

The solving engines in JKind are show in Fig. 1. The **Bounded Model Checking (BMC)** engine performs a standard iterative unrolling of the transition relation to find counterexamples and to serve as the base case of *k*-induction. The BMC engine guarantees that any counterexample it finds is minimal in length. The *k***-Induction** engine performs the inductive step of *k*-induction, possibly using invariants generated by other engines. The **Invariant Generation** engine uses a template-based invariant generation technique [22] using its own *k*-induction loop. The **Property Directed Reachability (PDR)** engine performs property directed reachability [11] using the implicit abstraction technique [9]. Unlike BMC and *k*-induction, each property is handled separately by a different PDR sub-engine. Finally, the **Advice** engine produces invariants based on previous runs of JKind as described in the next section.

Invariant sharing between the solvers (shown in Fig. 1) is an important part of the architecture. In our internal benchmarking, we have found that implicit abstraction PDR performs best when operating over a single property at a time and without use of lemmas generated by other approaches. On the other hand, the invariants generated by PDR and template lemma generation often allow *k*-induction, which operates on all properties in parallel, to substantially reduce the verification time required for models with large numbers of properties.

### **2.1 Post Processing and Re-verification**

A significant part of the research and development effort for JKind has focused on post-processing results for presentation and repeated verification of models under development.

**Inductive Validity Cores (IVC).** For a proven property, an inductive validity core is a subset of Lustre equations from the input model for which the property still holds [13,14]. Inductive validity cores can be used for traceability from property to model elements and determining coverage of the model by a set of properties [15]. This facility can be used to automatically generate traceability and adequacy information (such as traceability matrices [12] important to the certification of safety-critical avionics systems [26]). The IVC engine uses a heuristic algorithm to efficiently produce minimal or nearly minimal cores. In a recent experiment over a superset of the benchmark models described in the experiment in Sect. 3, we found that our heuristic IVC computation added 31% overhead to model checking time, and yielded cores approximately 8% larger than the guaranteed minimal core computed by a very expensive "brute force" algorithm. As a side-effect, the IVC algorithm also minimizes the set of invariants used to prove a property and emits this reduced set to other engines (notably the *Advice* engine, described below).

**Smoothing.** To aid in counterexample understanding and in creating structural coverage tests that can be more easily explained, JKind provides an optional post-processing step to minimize the number of changes to input variables *smoothing* the counterexample. For example, applied to 129 test cases generated for a production avionics flight control state machine, smoothing increased runtime by 40% and removed 4 unnecessary input changes per test case on average. The smoothing engine uses a MaxSat query over the original BMC-style unrolling of the transition relation combined with weighted assertions that each input variable does not change on each step. The MaxSat query tries to satisfy all of these weighted assertions, but will break them if needed. This has the effect of trying to hold all inputs constant while still falsifying the original property and only allowing inputs to change when needed. This engine is only available with SMT-solvers that support MaxSat such as Yices 1 and Z3.

**Advice.** The advice engine saves and re-uses the invariants that were used by JKind to prove the properties of a model. Prior to analysis, JKind performs model slicing and flattening to generate a flat transition-relation model. Internally, invariants are stored as a set of proven formulas (in the Lustre syntax) over the variables in the flattened model. An *advice* file is simply the emitted set of these invariant formulas. When a model is loaded, the formulas are loaded into memory. Formulas that are no longer syntactically or type correct are discarded, and the remaining set of formulas are submitted as an initial set of possible invariants to be proved via *k*-induction. If they are proved, they are passed along to other engines; if falsified, they are discarded. Names constructed between multiple runs of JKind are stable, so if a model is unchanged, it can be usually be re-proved quickly using the invariants and *k*-induction. If the model is slightly changed, it is often the case that most of the invariants can be re-proved, leading to reduced verification times.

If the IVC engine is also enabled, then advice emits a (close to) minimal set of lemmas used for proof; this often leads to faster re-verification (but more expensive initial verification), and can be useful for examining which of the generated lemmas are useful for proofs.

**Fig. 2.** Performance benchmarks

### **3 Experimental Evaluation**

We evaluated the performance of JKind against Kind 2 [8], Zustre [20], Generalized PDR in Z3 [19], and IC3 in nuXmv [9]. We used the default options for each tool (using check invar ic3 for nuXmv). Our benchmark suite comes from [9] and contains 688 models over the theory of linear integer arithmetic<sup>2</sup>. All experiments were performed on a 64-bit Ubuntu 17.10 Linux machine with a 12-core Intel Xeon CPU E5-1650 v3 @ 3.50 GHz, with 32 GB of RAM and a time limit of 60 s per model.

Performance comparisons are show in Fig. 2. The key describes the number of benchmarks solved for each tool, and the graph shows the aggregate time required for solving, ordered by time required per-problem, ordered independently for each tool. JKind was able to verify or falsify the most properties, although Z3 was often the fastest tool. Many of the benchmarks in this set are quickly evaluated: Z3 solves the first 400 benchmarks in just over 12 s. Due to JKind's use of Java, the JVM/JKind startup time for an empty model is approximately 0.35s, which leads to poor performance on small models<sup>3</sup>. As always, such benchmarks should be taken with a large grain of salt. In [8], a different set of benchmarks slightly favored Kind 2, and in [9], nuXmv was the most capable tool. We believe that all the solvers are relatively competitive.

### **4 Integration and Applications**

JKind is the back-end for a variety of user-facing applications. In this section, we briefly highlight a few of these applications and how they employ the features discussed previously.

<sup>2</sup> https://es.fbk.eu/people/griggio/papers/tacas14-ic3ia.tar.bz2. Note that we removed 263 duplicate benchmarks from the original set.

<sup>3</sup> Without startup time, the curve for JKind is close to the curve for Zustre.


# **5 Related Work**

JKind is one of a number of similar infinite-state inductive model checkers including Kind 2 [8], nuXmv [9], Z3 with generalized PDR [19], and Zustre [20]. They operate over a transition relation described either as a Lustre program (Kind 2, JKind, and Zustre), an extension of the SMV language (nuXmv), or as a set of Horn clauses (Z3). Each tool uses a portfolio-based solver approach, with nuXmv, JKind, and Kind 2 all supporting both *<sup>k</sup>*-induction and a variant of PDR/IC3. nuXmv also supports guided reachability and *<sup>k</sup>*liveness. Other tools such as ESBMC-DepthK [25], VVT [4] CPAchecker, [5], CPROVER [7] use similar techniques for reasoning about C programs.

We believe that the JKind IVC support is similar to *proof-core* support provided by commercial hardware model checkers: Cadence Jasper Gold and Synopsys VC Formal [1,2,18]. The proof-core provided by these tools is used for internal coverage analysis measurements performed by the tools. Unfortunately, the algorithms used in the commercial tool support are undocumented and performance comparisons are prohibited by the tool licenses, so it is not possible to compare performance on this aspect.

Previous work has been done on improving the quality of counterexamples along various dimensions similar to the JKind notion of *smoothing*, e.g. [16,24]. Our work is distinguished by its focus on minimizing the number of deltas in the input values. This metric has been driven by user needs and by our own experiences with test-case generation.

There are several tools that support reuse or exchange of verification results, similar to our *advice* feature. Recently, there has been progress on standardized formats [6] of exchange between analysis tools. Our current advice format is optimized for use and performance with our particular tool and designed for reverification rather than exchange of partial verification information. However, supporting a standardized format for exchanging verification information would be a useful feature for future use.

# **6 Conclusion**

JKind is similar to a number of other solvers that each solve infinite state sequential analysis problems. Nevertheless, it has some important features that distinguish it. First, a focus on quality of feedback to users for both valid properties (using IVCs) and invalid properties (using smoothing). Second, it is supported across all major platforms and is straightforward to port due to its implementation in Java. Third, it is small, modular, and well-architected, allowing straightforward extension with new engines. Fourth, it is open-source with a liberal distribution license (BSD), so it can be adapted for various purposes, as demonstrated by the number of tools that have incorporated it.

**Acknowledgments.** The work presented here was sponsored by DARPA as part of the HACMS program under contract FA8750-12-9-0179.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **The DEEPSEC Prover**

Vincent Cheval, Steve Kremer, and Itsaka Rakotonirina(B)

INRIA Nancy - Grand-Est & LORIA, Villers-l`es-Nancy, France *{*vincent.cheval,steve.kremer, itsaka.rakotonirina*}*@inria.fr

**Abstract.** In this paper we describe the DeepSec prover, a tool for security protocol analysis. It decides equivalence properties modelled as trace equivalence of two processes in a dialect of the applied pi calculus.

### **1 Introduction**

Cryptographic protocols ensure the security of communications. They are distributed programs that make use of cryptographic primitives, e.g. encryption, to ensure security properties, such as confidentiality or anonymity. Their correct design is quite a challenge as security is to be enforced in the presence of an *arbitrary* adversary that controls the communication network and may compromise participants. The use of symbolic verification techniques, in the line of the seminal work by Dolev and Yao [19], has proven its worth in discovering logical vulnerabilities or proving their absence.

Nowadays mature tools exist, e.g. [7,10,24] but mostly concentrate on *trace properties*, such as authentication and (weak forms of) confidentiality. Unfortunately many properties need to be expressed in terms of *indistinguishability*, modelled as behavioral equivalences in dedicated process calculi. Typically, a strong version of secrecy states that the adversary cannot distinguish the situation where a value *v*1, respectively *v*2, is used in place of a secret. Privacy properties, e.g., vote privacy, are also stated similarly [2,4,18].

In this paper we present the DeepSec prover (Deciding Equivalence Properties in Security protocols). The tool decides trace equivalence for cryptographic protocols that are specified in a dialect of the applied pi calculus [1]. DeepSec offers several advantages over existing tools, in terms of expressiveness, precision and efficiency: typically we do not restrict the use of private channels, allow else branches, and decide trace equivalence *precisely*, i.e., no approximations are applied. Cryptographic primitives are user specified by a set of subtermconvergent rewrite rules. The only restriction we make on protocol specifications

This work was supported by the ERC (agreement No. 645865-SPOOC) under the EU H2020 research and innovation program, and ANR project TECAP (ANR-17- CE39-0004-01).

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 28–36, 2018. https://doi.org/10.1007/978-3-319-96142-2\_4

is that we forbid unbounded replication, i.e. we restrict the analysis to a finite number of protocol sessions. This restriction is similar to that of several other tools and sufficient for decidability. Note that decidability is nevertheless nontrivial as the system under study is still infinite-state due to the active, arbitrary attacker participating to the protocol.

# **2 Description of the Tool**

# **2.1 Example: The Helios Voting Protocol**

An input of DeepSec defines the cryptographic primitives, the protocol and the security properties that are to be verified. Random numbers are abstracted by *names* (*a, b, . . .*), cryptographic primitives by *function symbols* with arity (*f /n*) and messages by *terms* viewed as *modus operandi* to compute bitstring. For instance, the functions aenc*/*3*,* pk*/*1 model randomized asymmetric encryption and public-key generation: term aenc(pk(k)*,* r*,* m) models the plain text m encrypted with public key pk(k) and randomness <sup>r</sup>. In DeepSec we write:

fun aenc/3. fun pk/1.

On the other hand, cryptographic destructors are specified by rewrite rules. For example asymmetric decryption (adec) would be defined by

```
reduc adec(k,aenc(pk(k),r,m)) -> m.
```
A plain text m can thus be retrieved from a cipher aenc(pk(k)*,* r*,* m) and the corresponding private key k. Such user-defined rewrite rules also allow us to describe more complex primitives such as a zero-knowledge proof (ZKP) asserting knowledge of the plaintext and randomness of a given ciphertext:

```
fun zkp/3.
const zpkok.
reduc check(zkp(r,v,aenc(p,r,v)), aenc(p,r,v)) -> zkpok.
```
Although user-defined, the rewrite system is required by DeepSec to be *subterm convergent*, i.e., the right hand side is a subterm of the left hand side or a ground term in normal form. Support for tuples and projection is provided by default.

*Protocol Specification.* Honest participants in a protocol are modeled as processes. For instance, the process Voter(auth,id,v,pkE) describes a voter in the Helios voting protocol. The process has four arguments: an authenticated channel auth, the voter's identifier id, its vote v and the public key of the tally pkE.

```
let Voter(auth,id,v,pkE) =
  new r;
  let bal = aenc(pkE,r,v) in
  out(auth,bal);
  out(c, (id, bal, zkp(r,v,bal))).
let VotingSystem(v1,v2) =
  new k; new auth1; new auth2;
  out(c,pk(k)); (
    Voter(auth1,id1,v1,pk(k)) |
    Voter(auth2,id2,v2,pk(k)) |
    Tally(k,auth1,auth2) ).
```
The voter first generates a random number r that will be used for encryption and ZKP. After that, she encrypts her vote and assigns it to the variable bal which is output on the channel auth. Finally, she outputs the ballot, id and the corresponding ZKP on the public channel c. All in all, the process VotingSystem(v1,v2) represents the complete voting scheme: two honest voters id1 and id2 respectively vote for v1 and v2; the

process Tally collects the ballots, checks the ZKP and outputs the result of the election. The instances of the processes Voter and Tally are executed concurrently, modeled by the parallel operator |. Other operators supported by DeepSec include input on a channel (in(c,x); P), conditional (if u = v then P else Q) and non-deterministic choice (P+Q).

*Security Properties.* DeepSec focuses on properties modelled as trace equivalence, e.g. vote privacy [18] in the Helios protocol. We express it at indistinguishability of two instances of the protocol swapping the votes of two honest voters:

```
query trace_equiv(VotingSystem(yes,no),VotingSystem(no,yes)).
```
DeepSec checks whether an attacker, implicitly modelled by the notion of trace equivalence, cannot distinguish between these two instances. Note that all actions of dishonest voters can be seen as actions of this single attacker entity; thus only honest participants need to be specified in the input file.

# **2.2 The Underlying Theory**

We give here a high-level overview of how DeepSec decides trace equivalence. Further intuition and details can be found in [14].

*Symbolic Setting.* Although finite-depth, even non-replicated protocols have infinite state space. Indeed, a simple input in(c,x) induces infinitely-many potential transitions in presence of an active attacker. We therefore define a *symbolic calculus* that abstracts concrete inputs by symbolic variables, and *constraints* that restrict their concrete instances. Constraints typically range over *deducibility contraints* ("the attacker is able to craft some term after spying on public channels") and *equations* ("two terms are equal"). A symbolic semantics then performs symbolic inputs and collects constraints on them. Typically, executing input in(c,x) generates a deducibility constraint on *x* to model the attacker being able to craft the message to be input; equations are generated by conditionals, relying on most general unifiers modulo equational theory.

*Decision Procedure.* DeepSec constructs a so-called *partition tree* to guide decision of (in)equivalence of processes *P* and *Q*. Its nodes are labelled by sets of symbolic processes and constraints; typically the root contains *P* and *Q* with empty constraints. The tree is constructed similarly to the (finite) tree of all symbolic executions of *P* and *Q*, except that some nodes may be merged or split accordingly to a constraint-solving procedure. DeepSec thus enforces that concrete instances of processes of a same node are indistinguishable (statically).

The final decision criterion is that *P* and *Q* are equivalent *iff* all nodes of the partition tree contain both a process originated from *P* and a process originated from *Q* by symbolic execution. The DeepSec prover thus returns an attack *iff* it finds a node violating this property while constructing the partition tree.

### **2.3 Implementation**

DeepSec is implemented in Ocaml (16k LOC) and the source code is licensed under GPL 3.0 and publicly available [17]. Running DeepSec yields a terminal output summarising results, while a more detailed output is displayed graphically in an HTML interface (using the MathJax API [20]). When the query is not satisfied, the interface interactively shows how to mount the attack.

*Partial-Order Reductions.* Tools verifying equivalences for bounded number of sessions suffer from a combinatorial explosion as the number of sessions increases. We therefore implemented state-of-the-art partial-order reductions (POR) [8] that eliminate redundant interleavings, providing a significant speedup. This is only possible for a restricted class of processes (determinate processes) but DeepSec automatically checks whether POR can be activated.

*Parallelism.* DeepSec generates a partition tree (cf Sect. 2.2) to decide trace equivalence. As sibling nodes are independent, the computation on subtrees can be parallelized. However, the partition tree is not balanced, making it hard to balance the load. One natural solution would be to systematically add children nodes into a queue of pending jobs, but this would yield an important communication overhead. Consequently, we apply this method only until the size of the queue is larger than a given threshold; next each idle process fetches a node and computes the *complete* corresponding subtree. Distributed computation over n cores is activated by the option -distributed n. By default, the threshold in the initial generation of the partition tree depends on n but may be overwritten to m with the option -nb sets m.

# **3 Experimental Evaluation**

*Comparison to Other Work.* When the number of sessions is unbounded, equivalence is undecidable. Verification tools in this setting therefore have to sacrifice termination, and generally only verify the finer *diff-equivalence* [9,11,23], too fine-grained on many examples. We therefore focus on tools comparable to DeepSec, i.e. those that bound the number of sessions. SPEC [25,26] verifies a sound symbolic bisimulation, but is restricted to fixed cryptographic primitives (pairing, encryption, signatures, hashes) and does not allow for else branches. APTE [13] covers the same primitives but allows else branches and decides trace equivalence exactly. On the contrary, Akiss [12] allows for user-defined primitives and terminates when they form a subterm-convergent rewrite system. However Akiss only decides trace equivalence without approximation for a subclass of processes (*determinate* processes) and may perform under- and over-approximations otherwise. Sat-Eq [15] proceeds differently by reducing the equivalence problem to Graph Planning and SAT Solving: the tool is more efficient than the others by several orders of magnitude, but is quite restricted in scope (it currently supports pairing, symmetric encryption, and can only analyse a subclass of determinate processes). Besides, Sat-Eq may not terminate.

*Authentication.* Figure 1 displays a sample of our benchmarks (complete results can be found in [17]). DeepSec clearly outperforms Akiss, APTE, and SPEC, but Sat-Eq takes the lead as the number of sessions increase. However, the Otway-Rees protocol already illustrates the scope limit of Sat-Eq.

Besides, as previously mentioned, DeepSec includes partial-order reductions (POR). We performed experiments with and without this optimisation: for example, protocols requiring more than 12 h of computation time without POR can be verified in less than a second. Note that Akiss and APTE also implement the same POR techniques as DeepSec.


**Fig. 1.** Benchmark results on classical authentication protocols


**Fig. 2.** Benchmark results for verifying privacy type properties

*Privacy.* We also verified privacy properties on the private authentication protocol [2], the passive-authentication and basic-access-control protocols from the e-passport [21], AKA of the 3G telephony networks [6] and the voting protocols Helios [3] and Prˆet-`a-Voter [22]. DeepSec is the only tool that can prove vote privacy on the two voting protocols, and private authentication is out of the scope of Sat-Eq and SPEC. Besides, we analysed variants of the Helios voting protocol, based on the work of Arapinis et al. [5] (see Fig. 2). The *vanilla* version is known vulnerable to a ballot-copy attack [16], which is patched by a ballot weeding (W) or a zero-knowledge proof (ZKP). DeepSec proved that, (*i*) when no revote is allowed, or (*ii*) when each honest voter only votes once and a dishonest voter is allowed to revote, then both patches are secure. However, only the ZKP variant remains secure when honest voters are allowed to revote.

*Parallelism.* Experiments have been carried out on a server with 40 Intel Xeon E5-2687W v3 CPUs 3.10 GHz, with 50 GB RAM and 25 MB L3 Cache, using 35 cores (Server 1). However the performances of parallelisation had some unexpected behavior. For example, on the Yahalom-Lowe protocol, the use of too many cores on a same server negatively impacts performances: e.g. on Server 1, optimal results are achieved using only 20 to 25 cores. In comparison, optimal results required 40–45 cores on a server with 112 Intel Xeon vE7-4850 v3 CPUs 2.20 GHz, with 1.5 TB RAM and 35 MB L3 Cache (Server 2). This difference may be explained by cache capacity: overloading servers with processes (sharing cache) beyond a certain threshold should indeed make the hitmiss ratio drop. This is consistent with the Server 2 having a larger cache and exploiting efficiently more cores than Server 1. Using the perf profiling tool, we confirmed that the number of cache-references per second (CRPS) stayed relatively stable up to the optimal number of cores and quickly decreased beyond (Fig. 3).

DeepSec can also distribute on multiple servers, using SSH connections. Despite a communication overhead, multi-server computation may be a way to partially avoid the server-overload issue discussed above. For example, the

**Fig. 3.** Performance analysis on Yahalom-Lowe protocol with 23 roles

verification of the Helios protocol (Dishonest revote W) on 3 servers (using resp. 10, 20 and 40 cores) resulted in a running time of 18 m 14 s, while the same verification took 51 m 49 s on a 70-core server (also launched remotely via SSH).

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# SimpleCAR**: An Efficient Bug-Finding Tool Based on Approximate Reachability**

Jianwen Li1(B) , Rohit Dureja<sup>1</sup>, Geguang Pu<sup>2</sup>, Kristin Yvonne Rozier<sup>1</sup>, and Moshe Y. Vardi<sup>3</sup>

 Iowa State University, Ames, IA, USA lijwen2748@gmail.com East China Normal University, Shanghai, China Rice University, Houston, TX, USA

**Abstract.** We present a new safety hardware model checker SimpleCAR that serves as a reference implementation for evaluating Complementary Approximate Reachability (CAR), a new SAT-based model checking framework inspired by classical reachability analysis. The tool gives a "bottom-line" performance measure for comparing future extensions to the framework. We demonstrate the performance of SimpleCAR on challenging benchmarks from the Hardware Model Checking Competition. Our experiments indicate that SimpleCAR is particularly suited for unsafety checking, or *bug-finding*; it is able to solve 7 unsafe instances within 1 h that are not solvable by any other state-of-the-art techniques, including BMC and IC3/PDR, within 8 h. We also identify a bug (reports safe instead of unsafe) and 48 counterexample generation errors in the tools compared in our analysis.

# **1 Introduction**

Model checking techniques are widely used in proving design correctness, and have received unprecedented attention in the hardware design community [9,16]. Given a system model *M* and a property *P*, model checking proves whether or not *P* holds for *M*. A model checking algorithm exhaustively checks all behaviors of *M*, and returns a counterexample as evidence if any behavior violates the property *P*. The counterexample gives the execution of the system that leads to property failure, i.e., a *bug*. Particularly, if *P* is a safety property, model checking reduces to reachability analysis, and the provided counterexample has a finite length. Popular safety checking techniques include Bounded Model Checking (BMC) [10], Interpolation Model Checking (IMC) [21], and IC3/PDR [12,14]. It is well known that there is no "universal" algorithm in model checking; different algorithms perform differently on different problem instances [7]. BMC outperforms IMC on checking unsafe instances, while IC3/PDR can solve instances that BMC cannot and vice-versa. [19]. Therefore, BMC and IC3/PDR are the most popular algorithms in the portfolio for unsafety checking, or *bug-finding*.

Complementary Approximate Reachability (CAR) [19] is a SAT-based model checking framework for reachability analysis. Contrary to reachability analysis via IC3/PDR, CAR maintains two sequences of over- and under- approximate reachable state-sets. The over-approximate sequence is used for safety checking, and the under-approximate sequence for unsafety checking. CAR does not require the over-approximate sequence to be monotone, unlike IC3/PDR. Both forward (Forward-CAR) and backward (Backward-CAR) reachability analysis are permissible in the CAR framework. Preliminary results show that Forward-CAR complements IC3/PDR on safe instances [19].

We present, SimpleCAR, a tool specifically developed for evaluating and extending the CAR framework. The new tool is a complete rewrite of CARChecker [19] with several improvements and added capabilities. SimpleCAR has a lighter and cleaner implementation than CARChecker. Several heuristics that aid Forward-CAR to complement IC3/PDR are integrated in CARChecker. Although useful, these heuristics make it difficult to understand and extend the core functionalities of CAR. Like IC3/PDR, the performance of CAR varies significantly by using heuristics [17]. Therefore, it is necessary to provide a basic implementation of CAR (without code-bloating heuristics) that serves as a "bottom-line" performance measure for all extensions in the future. To that end, SimpleCAR differs from CARChecker in the following aspects:


We apply SimpleCAR to 748 benchmarks from the Hardware Model Checking Competition (HWMCC) 2015 [2] and 2017 [3], and compare its performance to reachability analysis algorithms (BMC, IMC, 4 *<sup>×</sup>* IC3/PDR, Avy [22], Quip [18]) in state-of-the-art model checking tools (ABC, nuXmv, IIMC, IC3Ref). Our extensive experiments reveal that Backward-CAR is particularly suited for unsafety checking: it can solve 8 instances within a 1-h time limit, and 7 instances within a 8-h time limit not solvable by BMC and IC3/PDR. We conclude that, along with BMC and IC3/PDR, CAR is an important candidate in the portfolio of unsafety checking algorithms, and SimpleCAR provides an easy and efficient way to evaluate, experiment with, and add enhancements to the CAR framework. We identify 1 major bug and 48 errors in counterexample generation in our evaluated tool set; all have been reported to the tool developers.

# **2 Algorithms and Implementation**

We present a very high-level overview of the CAR framework (refer [19] for details). CAR is a SAT-based framework for reachability analysis. It maintains two over- and under- approximate reachable state sequences for safety and unsafety checking, respectively. CAR can be symmetrically implemented either in the forward (Forward-CAR) or backward (Backward-CAR) mode. In the forward mode, the F-sequence (*F*0*, F*1*,...,Fi*) is the over-approximated sequence, while the B-sequence (*B*0*, B*1*,...,Bi*) is under-approximated. The roles of the F- and B- sequence are reversed in the backward mode. We focus here on the backward mode of CAR, or Backward-CAR (refer [19] for Forward-CAR)

# **2.1 High-Level Description of** Backward-CAR

A frame *F<sup>i</sup>* in the F-sequence denotes the set of states that are reachable from the initial states (*I*) in *i* steps. Similarly, a frame *B<sup>i</sup>* in the B-sequence denotes the set of states that can reach the bad states (*¬P*) in *i* steps. Let *R*(*Fi*) represent the set of successor states of *Fi*, and *R*−<sup>1</sup>(*Bi*) represent the set of predecessor states of *Bi*. Table 1

Let *S*(*F*) = - *F<sup>i</sup>* and *S*(*B*) = - *Bi*. Algorithm 1 gives a description of Backward-CAR. The B-sequence is extended exactly once in every iteration of the loop in lines 2–8, but the Fsequence may be extended multiple times in each loop iteration in lines 3–5.


**Table 1.** Sequences in Backward-CAR.


shows the constraints on the sequences and their usage in Backward-CAR for safety and unsafety checking. **Alg. 1.** High-level description of Backward CAR 1: *F*<sup>0</sup> = *I*, *B*<sup>0</sup> = ¬*P*, *k* = 0; 2: **while** true **do** 3: **while** *<sup>S</sup>*(*B*) <sup>∧</sup> *<sup>R</sup>*(*S*(*F*)) <sup>=</sup> <sup>∅</sup> **do** 4: update *F*- and *B*- sequences. 5: **if** <sup>∃</sup>*<sup>i</sup>* · *<sup>F</sup><sup>i</sup>* ∩ ¬*<sup>P</sup>* <sup>=</sup> <sup>∅</sup> **then return** unsafe; 6: perform propagation on B-sequence (optional);

7: **if** <sup>∃</sup>*<sup>i</sup>* · *<sup>B</sup>i*+1 <sup>⊆</sup> -<sup>0</sup>≤*j*≤*<sup>i</sup> <sup>B</sup><sup>j</sup>* **then return** safe; 8: *k* = *k* + 1 and *B<sup>k</sup>* = ¬*P*;

As a result, CAR normally returns counterexamples with longer depth compared to the length of the B-sequence. Due to this inherent feature of the framework, CAR is able to complement BMC and IC3/PDR on unsafety checking.

### **2.2 Tool Implementation**

SimpleCAR is publicly available [5,6] under the GNU GPLv3 license. The tool implementation is as follows:


# **3 Experimental Analysis**

# **3.1 Strategies**

**Tools.** We consider five model checking tools in our evaluation: ABC 1.01 [13], IIMC 2.0<sup>1</sup>, Simplic3 [17] (IC3 algorithms used by nuXmv for finite-state systems<sup>2</sup>), IC3Ref [4], CARChecker [19], and SimpleCAR. For ABC, we evaluate BMC (bmc2), IMC (int), and PDR (pdr). There are three different versions of BMC in ABC: bmc, bmc2, and bmc3. We choose bmc2 based on our preliminary analysis since it outperforms other versions. Simplic3 proposes different configuration options for IC3. We use the three *best candidate* configurations for IC3 reported in [17], and the Avy algorithm [22] in Simplic3. We consider CARChecker as the original implementation of the CAR framework and use it as a reference implementation for SimpleCAR. A summary of the tools and their arguments used for experiments is shown in Table 2. Overall, we consider four categories of algorithms implemented in the tools: BMC, IMC, IC3/PDR, and CAR.

**Benchmarks.** We evaluate all tools against 748 benchmarks in the *aiger* format [11] from the SINGLE safety property track of the HWMCC in 2015 and 2017.

**Error Checking.** We check correctness of results from the tools in two ways:


**Platform.** Experiments were performed on Rice University's DavinCI cluster, which comprises of 192 nodes running at 2.83 GHz, 48 GB of memory and running RedHat 6.0. We set the memory limit to 8 GB with a wall-time limit of an hour. Each model checking run has exclusive access to a node. A time penalty of one hour is set for benchmarks that cannot be solved within the time/memory limits.

<sup>1</sup> We use version 2.0 available at https://ryanmb.bitbucket.io/truss/ – similar to the version available at https://github.com/mgudemann/iimc with addition of Quip [18].

<sup>2</sup> Personal communication with Alberto Griggio.


**Table 2.** Tools and algorithms (with category) evaluated in the experiments.

 with heuristics for *minimal unsat core* (MUC) [20], partial assignment [23], and propagation. *†* no heuristics

*‡* with heuristic for PDR-like clause propagation

### **3.2 Results**

**Error Report.** We identify one bug in simplic3-best3: reports safe instead of unsafe, and 48 errors with respect to counterexample generation in iimc-quip algorithm (26) and all algorithms in the Simplic3 tool (22). At the time of writing, the bug report sent to the developers of Simplic3 has been confirmed. In our analysis, we assume the results from these tools to be correct.

**Coarse Analysis.** We focus our analysis to unsafety checking. Figure 1 shows the total number of unsafe benchmarks solved by each category (assuming portfolio-run of all algorithms in a category). CAR **complements** BMC **and** IC3/PDR **by solving 128 benchmarks of which 8 are not solved by any other category.** Although CAR solves the least amount of total benchmarks, the count of the uniquely solved benchmarks is comparable to other categories. When the walltime limit (memory limit does not change) is increased to 8 h, BMC and IC3/PDR can only solve one of the 8 uniquely solved

**Fig. 1.** Number of benchmarks solved by each algorithm category (run as a portfolio). Uniquely solved benchmarks are not solved by any other category.

**Fig. 2.** Number of benchmarks solved by every algorithm in a category. Distinctly solved benchmarks by an algorithm are not solved by any algorithm in other categories. The set union of distinctly solved benchmarks for all algorithms in a category equals the count of uniquely solved for that category in Fig. 1.

benchmarks by CAR. The analysis supports our claim that CAR complements BMC/IC3/PDR on unsafety checking.

**Granular Analysis.** Figure <sup>2</sup> shows how each algorithm in the IC3/PDR (Fig. 2a) and CAR (Fig. 2b) categories performs on the benchmarks. simpcar-bp **distinctly solves all 8 benchmarks uniquely solved by the** CAR **category (Fig.** <sup>1</sup>**), while no single** IC3/PDR **algorithm distinctly solves all uniquely solved benchmarks in the** IC3/PDR **category.** In fact, a portfolio including at least abc-pdr, simplic3-best1, and simplic3-best2 solves all 8 instances uniquely solved by the IC3/PDR category. It is important to note that SimpleCAR is a very basic implementation of the CAR framework compared to the highly optimized implementations of IC3/PDR in other tools. Even then simpcar-b **outperforms four** IC3/PDR **implementations.** Our results show that Backward-CAR is a favorable algorithm for unsafety checking.

**Analysis Conclusions.** Backward-CAR presents a more promising research direction than Forward-CAR for unsafety checking. We conjecture that the performance of Forward- and Backward- CAR varies with the structure of the *<sup>a</sup>*iger model. Heuristics and performance-gain present a trade-off. simpcar-bp has a better performance compared to the heuristic-heavy carchk-b. On the other hand, simpcar-bp solves the most unsafe benchmarks in the CAR category, however, adding the "propagation" heuristic effects its performance: there are several benchmarks solved by simpcar-b but not by simpcar-bp.

# **4 Summary**

We present SimpleCAR, a safety model checker based on the CAR framework for reachability analysis. Our tool is a lightweight and extensible implementation of CAR with comparable performance to other state-of-the-art tool implementations of highly-optimized unsafety checking algorithms, and complements existing algorithm portfolios. Our empirical evaluation reveals that adding heuristics does not always improve performance. We conclude that Backward-CAR is a more promising research direction than Forward-CAR for unsafety checking, and our tool serves as the "bottom-line" for all future extensions to the CAR framework.

**Acknowledgments.** This work is supported by NSF CAREER Award CNS-1552934, NASA ECF NNX16AR57G, NSF CCF-1319459, and NSFC 61572197 and 61632005 grants. Geguang Pu is also partially supported by MOST NKTSP Project 2015BAG19B02 and STCSM Project No. 16DZ1100600.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **StringFuzz: A Fuzzer for String Solvers**

Dmitry Blotsky1(B), Federico Mora<sup>2</sup>, Murphy Berzish<sup>1</sup>, Yunhui Zheng<sup>3</sup>, Ifaz Kabir<sup>1</sup>, and Vijay Ganesh<sup>1</sup>

> <sup>1</sup> University of Waterloo, Waterloo, Canada *{*dblotsky,vganesh*}*@uwaterloo.ca <sup>2</sup> University of Toronto, Toronto, Canada fmora@cs.toronto.edu <sup>3</sup> IBM T.J. Watson Research Center, Yorktown Heights, USA

**Abstract.** In this paper, we introduce StringFuzz: a modular SMT-LIB problem instance transformer and generator for string solvers. We supply a repository of instances generated by StringFuzz in SMT-LIB 2.0/2.5 format. We systematically compare Z3str3, CVC4, Z3str2, and Norn on groups of such instances, and identify those that are particularly challenging for some solvers. We briefly explain our observations and show how StringFuzz helped discover causes of performance degradations in Z3str3.

### **1 Introduction**

In recent years, many algorithms for solving string constraints have been developed and implemented in SMT solvers such as Norn [6], CVC4 [12], and Z3 (e.g., Z3str2 [13] and Z3str3 [7]). To validate and benchmark these solvers, their developers have relied on hand-crafted input suites [1,4,5] or real-world examples from a limited set of industrial applications [2,11]. These test suites have helped developers identify implementation defects and develop more sophisticated solving heuristics. Unfortunately, as more features are added to solvers, these benchmarks often remain stagnant, leaving increasing functionality untested. As such, there is an acute need for a more robust, inexpensive, and automatic way of generating benchmarks to test the correctness and performance of SMT solvers.

Fuzzing has been used to test all kinds of software including SAT solvers [10]. Inspired by the utility of fuzzers, we introduce StringFuzz and describe its value as an exploratory testing tool. We demonstrate its efficacy by presenting limitations it helped discover in leading string solvers. To the best of our knowledge, StringFuzz is the only tool aimed at automatic generation of string constraints. StringFuzz can be used to mutate or transform existing benchmarks, as well as randomly generate structured instances. These instances can be scaled with respect to a variety of parameters, e.g., length of string constants, depth of concatenations (concats) and regular expressions (regexes), number of variables, number of length constraints, and many more.

# **Contributions**


# **2 StringFuzz**

**Implementation and Architecture.** StringFuzz is implemented as a Python package, and comes with several executables to generate, transform, and analyze SMT-LIB 2.0/2.5 string and regex instances. Its components are implemented as UNIX "filters" to enable easy integration with other tools (including themselves). For example, the outputs of generators can be piped into transformers, and transformers can be chained to produce a stream of tuned inputs to a solver. StringFuzz is composed of the following tools:

### stringfuzzg

This tool generates SMT-LIB instances. It supports several generators and options that specify its output. Details can be found in Table 1a.

### stringfuzzx

This tool transforms SMT-LIB instances. It supports several transformers and options that specify its output and input, which are explained in Table 1b. Note that transformers *Translate* and *Reverse* also preserve satisfiability under certain conditions.

### stringstats

This tool takes an SMT-LIB instance as input and outputs its properties: the number of variables/literals, the max/median syntactic depth of expressions, the max/median literal length, etc.

<sup>1</sup> All source code, problem suites, and supplementary material referenced in this paper are available at the StringFuzz website [3].

**Table 1.** StringFuzz built-in (a) generators and (b) transformers.


(a) stringfuzzg built-in generators.

(b) stringfuzzx built-in transformers.


<sup>a</sup>Can guarantee satisfiable output instances from satisfiable input instances [3]. <sup>b</sup>Can guarantee input and output instances will be equisatisfiable [3].

We organized StringFuzz to be easily extended. To show this, we note that while the whole project contains 3,183 lines of code, it takes an average of 45 lines of code to create a transformer. StringFuzz can be installed from source, or from the Python PIP package repository.

**Regex Generating Capabilities.** StringFuzz can generate and transform instances with regex constraints. For example, the command "stringfuzzg regex -r 2 -d 1 -t 1 -M 3 -X 10" produces this instance:

```
(set-logic QF_S)
(declare-fun var0 () String)
(assert (str.in.re var0 (re.+ (str.to.re "R5"))))
(assert (str.in.re var0 (re.+ (str.to.re "!PC"))))
(assert (<= 3 (str.len var0)))
(assert (<= (str.len var0) 10))
(check-sat)
```
Each instance is a set of one or more regex constraints on a single variable, with optional maximum and minimum length constraints. Each regex constraint is a concatenation (re.++ in SMT-LIB string syntax) of regex terms:

(re.++ T1 (re.++ T2 *...* (re.++ Tn-1 Tn )))

and each term Ti is recursively defined as any one of: repetition (re.\*), Kleene star (re.+), union (re.union), or a character literal. Nested operators are nested up to a specified (using the --depth flag) depth of recursion. Terms at depth 0 are regex constants. Below are 3 example regexes (in regex, not SMT-LIB, syntax) of depth 2 that can be produced this way:

((a|b)|(cc)+) ((ddd)∗) + ((ee) + |(fff)∗)

**Equisatisfiable String Transformations.** StringFuzz can also transform problem instances. This is done by manipulating parsed syntax trees. By default most of the built-in transformers only guarantee well-formedness, however, some can even guarantee equisatisfiability. Table 1b lists the built-in transformers and notes these guarantees.

**Example Use Case.** In Sect. 3 we use StringFuzz to generate benchmark suites in a batch mode. We can also use StringFuzz for on-line exploratory debugging. For example, the script below repeatedly feeds random StringFuzz instances to CVC4 until the solver produces an error:

```
while stringfuzzg -r random-ast -m \
    | tee instance.smt25 | cvc4 --lang smt2.5 --tlimit=5000 --strings-exp; do
    sleep 0
done
```
# **3 Instance Suites**

In this section, we describe the benchmark suites we generated with String-Fuzz, and on which we conducted our experimental evaluation. Table 2a lists instances that were generated by stringfuzzg. Table 2b lists instances derived from existing seed instances by iteratively applying stringfuzzx. Every transformed instance is named according to its seed and the transformations it undertook. For example, z3-regex-1-fuzz-graft.smt2 was transformed by applying *Fuzz* and then *Graft* to z3-regex-1.smt2.

The *Amazon* category contains 472 instances derived from two seeds supplied by our industrial collaborators. The *Regex* category is seeded by the Z3str2 regex test suite [4], which contains 42 instances. Through cumulative transformations we expanded the 42 seeds to 7,551 unique instances. Finally, the *Sanitizer* category is obtained from five industrial e-mail address and IPv4 sanitizers.

# **4 Experimental Results and Analysis**

We generated several problem instance suites with StringFuzz that made one solver perform poorly, but not others.<sup>2</sup> They are *Concats-Balanced*, *Concats-Big*, *Concats-Extracts-Small*, and *Different-Prefix* . Figure 1 shows the suites that

<sup>2</sup> Only the results that made one solver perform poorly and not others are presented, but results for all StringFuzz suites are available on the StringFuzz website [3].



(a) stringfuzzg-generated instances.

(b) stringfuzzx-generated instances.


(a) Performance on Concats-Extracts-Small (b) Performance on Different-Prefix

**Fig. 1.** Instances hard for CVC4

were uniquely difficult for CVC4. Figure 2 shows the suites that were uniquely difficult for Z3str3. All experiments were conducted in series, each with a timeout of 15 s, on an Ubuntu Linux 16.04 computer with 32 GB of RAM and an Intel<sup>R</sup> CoreTM i7-6700 CPU (3.40 GHz).

**Usefulness to Z3str3: A Case Study.** StringFuzz's ability to produce scaling instances helped uncover several implementation issues and performance limitations in Z3str3. Scaling inputs can reveal issues that would normally be out of scope for unit tests or industrial benchmarks. Three different performance and

**Fig. 2.** Instances hard for Z3str3

(a) Performance on Concats-Balanced (b) Performance on Concats-Big

implementation bugs were identified and fixed in Z3str3 as a result of testing with the StringFuzz scaling suites *Lengths-Long* and *Concats-Big*.

StringFuzz also helped identify a number of performance-related issues and opportunities for new heuristics in Z3str3. For example, by examining Z3str3's execution traces on the instances in the *Concats-Big* suite we discovered a potential new heuristic. In particular, Z3str3 does not make full use of the solving context (e.g. some terms are empty strings) to simplify the concatenations of a long list of string terms before trying to reason about the equivalences among subterms. Z3str3 therefore introduces a large number of unnecessary intermediate variables and propagations.

# **5 Related Work**

Many solver developers create their own test suites to validate their solvers [1, 4,5]. Several popular instance suites are also publicly available for solver testing and benchmarking, such as the Kaluza [2] and Kausler [11] suites. There are likewise several fuzzers and instance generators currently available, but none of them can generate or transform string and regex instances. For example, the FuzzSMT [9] tool generates SMT-LIB instances with bit-vectors and arrays, but does not support strings or regexes. The SMTpp [8] tool pre-processes and simplifies instances, but does not generate new ones or fuzz existing ones.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Static Analysis

# **Permission Inference for Array Programs**

Jérôme Dohrau(B) , Alexander J. Summers, Caterina Urban, Severin Münger, and Peter Müller

> Department of Computer Science, ETH Zurich, Zurich, Switzerland {jerome.dohrau,alexander.summers, caterina.urban,peter.mueller}@inf.ethz.ch, severin.muenger@alumni.ethz.ch

**Abstract.** Information about the memory locations accessed by a program is, for instance, required for program parallelisation and program verification. Existing inference techniques for this information provide only partial solutions for the important class of array-manipulating programs. In this paper, we present a static analysis that infers the memory footprint of an array program in terms of permission pre- and postconditions as used, for example, in separation logic. This formulation allows our analysis to handle concurrent programs and produces specifications that can be used by verification tools. Our analysis expresses the permissions required by a loop via maximum expressions over the individual loop iterations. These maximum expressions are then solved by a novel maximum elimination algorithm, in the spirit of quantifier elimination. Our approach is sound and is implemented; an evaluation on existing benchmarks for memory safety of array programs demonstrates accurate results, even for programs with complex access patterns and nested loops.

# **1 Introduction**

Information about the memory locations accessed by a program is crucial for many applications such as static data race detection [45], code optimisation [16,26,33], program parallelisation [5,17], and program verification [23,30,38,39]. The problem of inferring this information statically has been addressed by a variety of static analyses, e.g., [9,42]. However, prior works provide only partial solutions for the important class of array-manipulating programs for at least one of the following reasons. (1) They approximate the entire array as one single memory location [4] which leads to imprecise results; (2) they do not produce specifications, which are useful for several important applications such as human inspection, test case generation, and especially deductive program verification; (3) they are limited to sequential programs.

In this paper, we present a novel analysis for array programs that addresses these shortcomings. Our analysis employs the notion of *access permission* from separation logic and similar program logics [40,43]. These logics associate a permission with each memory location and enforce that a program part accesses a

location only if it holds the associated permission. In this setting, determining the accessed locations means to infer a sufficient precondition that specifies the permissions required by a program part.

Phrasing the problem as one of permission inference allows us to address the three problems mentioned above. (1) We distinguish different array elements by tracking the permission for each element separately. (2) Our analysis infers pre- and postconditions for both methods and loops and emits them in a form that can be used by verification tools. The inferred specifications can easily be complemented with permission specifications for non-array data structures and with functional specifications. (3) We support concurrency in three important ways. First, our analysis is sound for concurrent program executions because permissions guarantee that program executions are data race free and reduce thread interactions to specific points in the program such as forking or joining a thread, or acquiring or releasing a lock. Second, we develop our analysis for a programming language with primitives that represent the ownership transfer that happens at these thread interaction points. These primitives, inhale and exhale [31,38], express that a thread obtains permissions (for instance, by acquiring a lock) or loses permissions (for instance, by passing them to another thread along with a message) and can thereby represent a wide range of thread interactions in a uniform way [32,44]. Third, our analysis distinguishes read and write access and, thus, ensures exclusive writes while permitting concurrent read accesses. As is standard, we employ *fractional permissions* [6] for this purpose; a full permission is required to write to a location, but any positive fraction permits read access.

**Approach.** Our analysis reduces the problem of reasoning about permissions for array elements to reasoning about numerical values for permission fractions. To achieve this, we represent permission fractions for all array elements *qa*[*qi*] using a *single* numerical expression *t*(*qa, qi*) parameterised by *q<sup>a</sup>* and *qi*. For instance, the conditional term (*qa*=<sup>a</sup> <sup>∧</sup> *<sup>q</sup>i*=<sup>j</sup> ? 1 : 0) represents full permission (denoted by 1) for array element a[j] and no permission for all other array elements.

Our analysis employs a *precise* backwards analysis for *loop-free* code: a variation on the standard notion of weakest preconditions. We apply this analysis to loop bodies to obtain a permission precondition for a single loop iteration. Per array element, the *whole loop* requires the *maximum* fraction over all loop iterations, adjusted by permissions gained and lost during loop execution. Rather than computing permissions via a fixpoint iteration (for which a precise widening operator is difficult to design), we express them as a maximum over the variables changed by the loop execution. We then use inferred numerical invariants on these variables and a novel *maximum elimination* algorithm to infer a specification for the entire loop. Permission postconditions are obtained analogously.

For the method copyEven in Fig. 1, the analysis determines that the permission amount required by a single loop iteration is (j%2=0?(*qa*=<sup>a</sup> <sup>∧</sup> *<sup>q</sup>i*=j?rd:0):(*qa*=<sup>a</sup> <sup>∧</sup> *qi*=<sup>j</sup> ? 1: 0)). The symbol rd represents a fractional read permission. Using a suitable integer invariant for the loop counter j, we obtain the loop precondition

**Fig. 3.** Programming Language. *n* ranges over integer constants, *x* over integer variables, *a* over array variables, *q* over non-negative fractional (permission-typed) constants. *e* stands for integer expressions, and *b* for boolean. Permission expressions *p* are a separate syntactic category.

max<sup>j</sup>|0≤j*<*len(a) ((j%2=0 ? (*qa*=<sup>a</sup> <sup>∧</sup> *<sup>q</sup>i*=<sup>j</sup> ? rd : 0) : (*qa*=<sup>a</sup> <sup>∧</sup> *<sup>q</sup>i*=<sup>j</sup> ? 1 : 0))). Our maximum elimination algorithm obtains (*qa*=<sup>a</sup> <sup>∧</sup> <sup>0</sup>≤*qi<*len(a)?(*q<sup>i</sup>*%2=0 ? rd: 1): 0). By ranging over all *q<sup>a</sup>* and *qi*, this can be read as read permission for even indices and write permission for odd indices within the array a's bounds.

**Contributions.** The contributions of our paper are:


### **2 Programming Language**

We define our inference technique over the programming language in Fig. 3. Programs operate on integers (expressions *e*), booleans (expressions *b*), and onedimensional integer arrays (variables *a*); a generalisation to other forms of arrays

is straightforward and supported by our implementation. Arrays are read and updated via the statements *x* := *a*[*e*] and *a*[*e*] := *x*; array lookups in expressions are not part of the surface syntax, but are used internally by our analysis. Permission expressions *p* evaluate to rational numbers; rd, min, and max are for internal use.

A full-fledged programming language contains many statements that affect the ownership of memory locations, expressed via permissions [32,44]. For example in a concurrent setting, a fork operation may transfer permissions to the new thread, acquiring a lock obtains permission to access certain memory locations, and messages may transfer permissions between sender and receiver. Even in a sequential setting, the concept is useful: in procedure-modular reasoning, a method call transfers permissions from the caller to the callee, and back when the callee terminates. Allocation can be represented as obtaining a fresh object and then obtaining permission to its locations.

For the purpose of our permission inference, we can reduce all of these operations to two basic statements that directly manipulate the permissions currently held [31,38]. An inhale(*a, e, p*) statement adds the amount *p* of permission for the array location *a*[*e*] to the currently held permissions. Dually, an exhale(*a, e, p*) statement requires that this amount of permission is *already* held, and then removes it. We assume that for any inhale or exhale statements, the permission expression *p* denotes a non-negative fraction. For simplicity, we restrict inhale and exhale statements to a *single* array location, but the extension to unboundedly-many locations from the same array is straightforward [37].

**Semantics.** The operational semantics of our language is mostly standard, but is instrumented with additional state to track how much permission is held to each heap location; a program state therefore consists of a triple of heap *H* (mapping pairs of array identifier and integer index to integer values), a *permission map P*, mapping such pairs to *permission amounts*, and an environment *σ* mapping variables to values (integers or array identifiers).

The execution of inhale or exhale statements causes modifications to the permission map, and all array accesses are guarded with checks that *at least some* permission is held when reading and that full (1) permission is held when writing [6]. If these checks (or an exhale statement) fail, the execution terminates with a *permission failure*. Permission amounts greater than 1 indicate invalid states that cannot be reached by a program execution. We model run-time errors other than permission failures (in particular, out-of-bounds accesses) as stuck configurations.

# **3 Permission Inference for Loop-Free Code**

Our analysis infers a sufficient permission precondition and a guaranteed permission postcondition for each method of a program. Both conditions are mappings from array elements to permission amounts. Executing a statement *s* in a state

**Fig. 4.** The backwards analysis rules for permission preconditions and relative permission differences. The notation *αa,e*(*p*) is a shorthand for (*qa*=*a* ∧ *qi*=*e* ? *p* : 0) and denotes *p* permission for the array location *a*[*e*]. Moreover, *p*[*a*- [*e*- ] → *e*] matches all array accesses in *p* and replaces them with the expression obtained from *e* by substituting all occurrences of *a* and *e* with the matched array and index, respectively. The cases for inhale statements are slightly simplified; the full rules are given in Fig. 6 of the TR [15].

whose permission map *P* contains at least the permissions required by a *sufficient permission precondition* for *s* is guaranteed to not result in a permission failure. A *guaranteed permission postcondition* expresses the permissions that will at least be held when *s* terminates (see Sect. A of the TR [15] for formal definitions).

In this section, we define inference rules to compute sufficient permission preconditions for loop-free code. For programs which do not add or remove permissions via inhale and exhale statements, the same permissions will still be held after executing the code; however, to infer guaranteed permission postconditions in the general case, we also infer the difference in permissions between the state before and after the execution. We will discuss loops in the next section. Non-recursive method calls can be handled by applying our analysis bottom-up in the call graph and using inhale and exhale statements to model the permission effect of calls. Recursion can be handled similarly to loops, but is omitted here.

We define our permission analysis to track and generate *permission expressions* parameterised by two distinguished variables *q<sup>a</sup>* and *qi*; by parameterising our expressions in this way, we can use a single expression to represent a permission amount for each pair of *q<sup>a</sup>* and *q<sup>i</sup>* values.

**Preconditions.** The *permission precondition* of a loop-free statement *s* and a postcondition permission *p* (in which *q<sup>a</sup>* and *q<sup>i</sup>* potentially occur) is denoted by *pre*(*s, p*), and is defined in Fig. 4. Most rules are straightforward adaptations of a classical weakest-precondition computation. Array lookups require some permission to the accessed array location; we use the internal expression rd to denote a non-zero permission amount; a post-processing step can later replace rd by

a concrete rational. Since downstream code may require further permission for this location, represented by the permission expression *p*, we take the maximum of both amounts. Array updates require full permission and need to take aliasing into account. The case for inhale subtracts the inhaled permission amount from the permissions required by downstream code; the case for exhale adds the permissions to be exhaled. Note that this addition may lead to a required permission amount exceeding the full permission. This indicates that the statement is not feasible, that is, all executions will lead to a permission failure.

To illustrate our *pre* definition, let *s* be the body of the loop in the parCopyEven method in Fig. 2. The precondition *pre*(*s,* 0) = (*qa*=<sup>a</sup> <sup>∧</sup> *<sup>q</sup>i*=2∗<sup>j</sup> ? 1*/*2 : 0) + (*qa*=<sup>a</sup> <sup>∧</sup> *<sup>q</sup>i*=2∗j+1 ? 1 : 0) expresses that a loop iteration requires a half permission for the even elements of array a and full permission for the odd elements.

**Postconditions.** The final state of a method execution includes the permissions held in the method pre-state, adjusted by the permissions that are inhaled or exhaled during the method execution. To perform this adjustment, we compute the difference in permissions before and after executing a statement. The *relative permission difference* for a loop-free statement *s* and a permission expression *p* (in which *q<sup>a</sup>* and *q<sup>i</sup>* potentially occur) is denoted by Δ(*s, p*), and is defined backward, analogously to *pre* in Fig. 4. The second parameter *p* acts as an accumulator; the difference in permission is represented by evaluating Δ(*s,* 0).

For a statement *s* with precondition *pre*(*s,* 0), we obtain the postcondition *pre*(*s,* 0)+Δ(*s,* 0). Let *s* again be the loop body from parCopyEven. Since *s* contains **exhale** statements, we obtain Δ(*s,* 0) = 0 <sup>−</sup> (*qa*=<sup>a</sup> <sup>∧</sup> *<sup>q</sup>i*=2∗<sup>j</sup> ? 1*/*2 : 0) <sup>−</sup> (*qa*=<sup>a</sup> <sup>∧</sup> *<sup>q</sup>i*=2∗j+1 ? 1 : 0). Thus, the postcondition *pre*(*s,* 0) + Δ(*s,* 0) can be simplified to 0. This reflects the fact that all required permissions for a single loop iteration are lost by the end of its execution.

Since our Δ operator performs a backward analysis, our permission postconditions are expressed in terms of the pre-state of the execution of *s*. To obtain classical postconditions, any heap accesses need to refer to the pre-state heap, which can be achieved in program logics by using **old** expressions or logical variables. Formalizing the postcondition inference as a backward analysis simplifies our treatment of loops and has technical advantages over classical strongest-postconditions, which introduce existential quantifiers for assignment statements. A limitation of our approach is that our postconditions cannot capture situations in which a statement obtains permissions to locations for which no pre-state expression exists, e.g. allocation of new arrays. Our postconditions are sound; to make them precise for such cases, our inference needs to be combined with an additional forward analysis, which we leave as future work.

# **4 Handling Loops via Maximum Expressions**

In this section, we first focus on obtaining a sufficient permission precondition for the execution of a loop in isolation (independently of the code after it) and then combine the inference for loops with the one for loop-free code described above.

#### **4.1 Sufficient Permission Preconditions for Loops**

A sufficient permission precondition for a loop guarantees the absence of permission failures for a potentially unbounded number of executions of the loop body. This concept is different from a loop invariant: we require a precondition for all executions of a particular loop, but it need not be inductive. Our technique obtains such a loop precondition by projecting a permission precondition for a single loop iteration over all possible initial states for the loop executions.

**Exhale-Free Loop Bodies.** We consider first the simpler (but common) case of a loop that does not contain exhale statements, e.g., does not transfer permissions to a forked thread. The solution for this case is also sound for loop bodies where each exhale is followed by an inhale for the same array location and at least the same permission amount, as in the encoding of most method calls.

Consider a sufficient permission precondition *p* for the body of a loop while (*b*) { *<sup>s</sup>* }. By definition, *<sup>p</sup>* will denote sufficient permissions to execute *s* once; the precise locations to which *p* requires permission depend on the initial state of the loop iteration. For example, the sufficient permission precondition for the body of the copyEven method in Fig. 1, (j%2=0?(*qa*=<sup>a</sup> <sup>∧</sup> *<sup>q</sup>i*=j?rd:0):(*qa*=<sup>a</sup> <sup>∧</sup> *qi*=<sup>j</sup> ? 1 : 0)), requires permissions to different array locations, depending on the value of j. To obtain a sufficient permission precondition for the entire loop, we leverage an *over-approximating* loop invariant <sup>I</sup><sup>+</sup> from an off-the-shelf numerical analysis (e.g., [13]) to over-approximate all possible values of the numerical variables that get assigned in the loop body, here, j. We can then express the loop precondition using the *pointwise maximum* max<sup>j</sup>|I+∧*<sup>b</sup>* (*p*), over the values of <sup>j</sup> that satisfy the condition <sup>I</sup><sup>+</sup> <sup>∧</sup> *<sup>b</sup>*. (The maximum over an empty range is defined to be 0.) For the copyEven method, given the invariant 0 ≤ <sup>j</sup> ≤ len(a), the loop precondition is max<sup>j</sup>|0≤j*<*len(a) (*p*).

In general, a permission precondition for a loop body may also depend on array *values*, e.g., if those values are used in branch conditions. To avoid the need for an expensive array value analysis, we define both an over- and an underapproximation of permission expressions, denoted *p*<sup>↑</sup> and *p*<sup>↓</sup> (cf. Sect. A.1 of the TR [15]), with the guarantees that *<sup>p</sup>* <sup>≤</sup> *<sup>p</sup>*<sup>↑</sup> and *<sup>p</sup>*<sup>↓</sup> <sup>≤</sup> *<sup>p</sup>*. These approximations abstract away array-dependent conditions, and have an impact on precision only when array values are used to determine a location to be accessed. For example, a linear array search for a particular value accesses the array only up to the (a-priori unknown) point at which the value is found, but our permission precondition conservatively requires access to the full array.

**Theorem 1.** *Let* while (*b*) { *<sup>s</sup>* } *be an exhale-free loop, let <sup>x</sup> be the integer variables modified by <sup>s</sup>, and let* <sup>I</sup><sup>+</sup> *be a sound over-approximating numerical loop invariant (over the integer variables in <sup>s</sup>). Then* max*<sup>x</sup>*|I+∧*<sup>b</sup>* (*pre*(*s,* 0)↑) *is a sufficient permission precondition for* while (*b*) { *<sup>s</sup>* }*.*

**Loops with Exhale Statements.** For loops that contain exhale statements, the approach described above does not always guarantee a sufficient permission

precondition. For example, if a loop gives away full permission to the *same* array location in every iteration, our pointwise maximum construction yields a precondition requiring the full permission once, as opposed to the *unsatisfiable* precondition (since the loop is guaranteed to cause a permission failure).

As explained above, our inference is sound if each exhale statement is followed by a corresponding inhale, which can often be checked syntactically. In the following, we present another decidable condition that guarantees soundness and that can be checked efficiently by an SMT solver. If neither condition holds, we preserve soundness by inferring an unsatisfiable precondition; we did not encounter any such examples in our evaluation.

Our soundness condition checks that the maximum of the permissions required by two loop iterations is not less than the permissions required by executing the two iterations in sequence. Intuitively, that is the case when neither iteration removes permissions that are required by the other iteration.

**Theorem 2 (Soundness Condition for Loop Preconditions).** *Given a loop* while (*b*) { *<sup>s</sup>* }*, let <sup>x</sup> be the integer variables modified in <sup>s</sup> and let <sup>v</sup> and <sup>v</sup> be two fresh sets of variables, one for each of <sup>x</sup>. Then* max*<sup>x</sup>*|I+∧*<sup>b</sup>* (*pre*(*s,* 0)↑) *is a sufficient permission precondition for* while (*b*) { *<sup>s</sup>* } *if the following implication is valid in all states:*

$$\begin{array}{c} (\mathcal{I}^+ \wedge b) \overline{[v/x]} \wedge (\mathcal{I}^+ \wedge b) \overline{[v'/x]} \wedge (\bigvee \overline{v \neq v'}) \Rightarrow\\ \max(pre(s,0)^\dagger \overline{[v/x]}, pre(s,0)^\dagger \overline{[v'/x]}) \ge pre(s, pre(s,0)^\dagger \overline{[v'/x]})^\dagger \overline{[v/x]} \end{array}$$

The additional variables *v* and *v* are used to model two arbitrary valuations of *x*; we constrain these to represent two initial states allowed by <sup>I</sup><sup>+</sup> <sup>∧</sup> *<sup>b</sup>* and different from each other for at least one program variable. We then require that the effect of analysing each loop iteration independently and taking the maximum is not smaller than the effect of sequentially composing the two loop iterations.

The theorem requires implicitly that no two different iterations of a loop observe exactly the same values for all integer variables. If that could be the case, the condition *<sup>v</sup>* <sup>=</sup> *<sup>v</sup>* would cause us to ignore a potential pair of initial states for two different loop iterations. To avoid this problem, we assume that all loops satisfy this requirement; it can easily be enforced by adding an additional variable as loop iteration counter [21].

For the parCopyEven method (Fig. 2), the soundness condition holds since, due to the *<sup>v</sup>* <sup>=</sup> *<sup>v</sup>* condition, the two terms on the right of the implication are equal for all values of *qi*. We can thus infer a sufficient precondition as max<sup>j</sup>|0≤j*<*len(a)*/*<sup>2</sup> ((*qa*=<sup>a</sup> <sup>∧</sup> *<sup>q</sup>i*=2∗<sup>j</sup> ? 1*/*2 : 0) + (*qa*=<sup>a</sup> <sup>∧</sup> *<sup>q</sup>i*=2∗j+1 ? 1 : 0)).

### **4.2 Permission Inference for Loops**

We can now extend the pre- and postcondition inference from Sect. 3 with loops. *pre*(while (*b*) { *<sup>s</sup>* }*, p*) must require permissions such that (1) the loop executes without permission failure and (2) at least the permissions described by *p* are held when the loop terminates. While the former is provided by the loop precondition

as defined in the previous subsection, the latter also depends on the permissions gained or lost during the execution of the loop. To characterise these permissions, we extend the Δ operator from Sect. 3 to handle loops.

Under the soundness condition from Theorem 2, we can mimic the approach from the previous subsection and use over-approximating invariants to project out the permissions *lost* in a single loop iteration (where Δ(*s,* 0) is negative) to those lost by the entire loop, using a maximum expression. This projection conservatively assumes that the permissions lost in a single iteration are lost by all iterations whose initial state is allowed by the loop invariant and loop condition. This approach is a sound over-approximation of the permissions *lost*.

However, for the permissions *gained* by a loop iteration (where Δ(*s,* 0) is positive), this approach would be unsound because the over-approximation includes iterations that may not actually happen and, thus, permissions that are not actually gained. For this reason, our technique handles gained permissions via an *under-approximate*<sup>1</sup> numerical loop invariant <sup>I</sup><sup>−</sup> (e.g., [35]) and thus projects the gained permissions only over iterations that will surely happen.

This approach is reflected in the definition of our Δ operator below via *d*, which represents the permissions *possibly lost* or *definitely gained* over all iterations of the loop. In the former case, we have Δ(*s,* 0) *<* 0 and, thus, the first summand is 0 and the computation based on the over-approximate invariant applies (note that the negated maximum of negated values is the minimum; we take the minimum over negative values). In the latter case (Δ(*s,* 0) *>* 0), the second summand is 0 and the computation based on the under-approximate invariant applies (we take the maximum over positive values).

$$\begin{array}{c} \Delta(\mathtt{while}\ (b)\ \{\ s\ \}, p) = (b\ \mathtt{?}\ d + p' : p), \; where:\\ d = \max\_{\overline{x}|\mathbb{Z}^- \wedge b} \max(0, \Delta(s, 0))^\downarrow - \max\_{\overline{x}|\mathbb{Z}^+ \wedge b} \max(0, -\Delta(s, 0))^\uparrow \\\ p' = \max\_{\overline{x}|\mathbb{Z}^- \wedge \neg b} \max(0, p)^\downarrow - \max\_{\overline{x}|\mathbb{Z}^+ \wedge \neg b} \max(0, -p)^\uparrow \end{array}$$

*x* denotes again the integer variables modified in *s*. The role of *p* is to carry over the permissions *p* that are gained or lost by the code following the loop, taking into account any state changes performed by the loop. Intuitively, the maximum expressions replace the variables *x* in *p* with expressions that do not depend on these variables but nonetheless reflect properties of their values right after the execution of the loop. For permissions gained, these properties are based on the under-approximate loop invariant to ensure that they hold for any possible loop execution. For permissions lost, we use the over-approximate invariant. For the loop in parCopyEven we use the invariant 0 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> len(a)*/*2 to obtain *<sup>d</sup>* <sup>=</sup> <sup>−</sup>max<sup>j</sup>|0≤j*<*len(a)*/*<sup>2</sup> ((*qa*=<sup>a</sup> <sup>∧</sup> *<sup>q</sup>i*=2∗<sup>j</sup> ? 1*/*2 : 0) + (*qa*=<sup>a</sup> <sup>∧</sup> *<sup>q</sup>i*=2∗j+1 ? 1 : 0)). Since there are no statements following the loop, *p* and therefore *p* are 0.

Using the same *d* term, we can now define the general case of *pre* for loops, combining (1) the loop precondition and (2) the permissions required by the code after the loop, adjusted by the permissions gained or lost during loop execution:

<sup>1</sup> An under-approximate loop invariant must be true *only* for states that will actually be encountered when executing the loop.

$$pre(\mathtt{while1o}\ (b)\ \{\ s\ \}, p) = (b\ ?\ max(\max\_{\overline{x}|\mathcal{Z}^+ \land b} pre(s, 0)^\dagger, \max\_{\overline{x}|\mathcal{Z}^+ \land \neg b} (p^\dagger) - d): p)$$

Similarly to *<sup>p</sup>* in the rule for Δ, the expression max*x*|I+∧¬*<sup>b</sup>* (*p*↑) conservatively over-approximates the permissions required to execute the code after the loop. For method parCopyEven, we obtain a sufficient precondition that is the negation of the Δ. Consequently, the postcondition is 0.

**Soundness.** Our *pre* and Δ definitions yield a sound method for computing sufficient permission preconditions and guaranteed postconditions:

**Theorem 3 (Soundness of Permission Inference).** *For any statement s, if every* while *loop in s either is exhale-free or satisfies the condition of Theorem 2 then pre*(*s,* 0) *is a sufficient permission precondition for s, and pre*(*s,* 0)+Δ(*s,* 0) *is a corresponding guaranteed permission postcondition.*

Our inference expresses pre and postconditions using a maximum operator over an unbounded set of values. However, this operator is not supported by SMT solvers. To be able to use the inferred conditions for SMT-based verification, we provide an algorithm for eliminating these operators, as we discuss next.

# **5 A Maximum Elimination Algorithm**

We now present a new algorithm for replacing maximum expressions over an unbounded set of values (called *pointwise maximum expressions* in the following) with equivalent expressions containing no pointwise maximum expressions. Note that, technically our algorithm computes solutions to max*<sup>x</sup>*|*b*∧*p*≥<sup>0</sup>(*p*) since some optimisations exploit the fact that the permission expressions our analysis generates always denote non-negative values.

### **5.1 Background: Quantifier Elimination**

Our algorithm builds upon ideas from Cooper's classic *quantifier elimination* algorithm [11] which, given a formula <sup>∃</sup>*x.b* (where *<sup>b</sup>* is a quantifier-free Presburger formula), computes an equivalent quantifier-free formula *b* . Below, we give a brief summary of Cooper's approach.

The problem is first reduced via boolean and arithmetic manipulations to a formula <sup>∃</sup>*x.b* in which *<sup>x</sup>* occurs at most once per literal and with no coefficient. The key idea is then to reduce <sup>∃</sup>*x.b* to a disjunction of two cases: (1) there is a *smallest* value of *x* making *b* true, or (2) *b* is true for *arbitrarily small* values of *x*.

In case (1), one computes a *finite* set of expressions *S* (the *b<sup>i</sup>* in [11]) guaranteed to include the smallest value of *x*. For each (in/dis-)equality literal containing *x* in *b*, one collects a *boundary expression e* which denotes a value for *x*

making the literal true, while the value *<sup>e</sup>* <sup>−</sup> 1 would make it false. For example, for the literal *y<x* one generates the expression *y* + 1. If there are no (non-) divisibility constraints in *b*, by definition, *S* will include the smallest value of *x* making *b* true. To account for (non-)divisibility constraints such as *x*%2=0, the lowest-common-multiple *δ* of the divisors (and 1) is returned along with *S*; the guarantee is then that the smallest value of *x* making *b* true will be *e* + *d* for some *<sup>e</sup>* <sup>∈</sup> *<sup>S</sup>* and *<sup>d</sup>* <sup>∈</sup> [0*, δ* <sup>−</sup> 1]. We use *<sup>b</sup>smal l*(*x*) to denote the function handling this computation. Then, <sup>∃</sup>*x.b* can be reduced to - *<sup>e</sup>*∈*S,d*∈[0*,δ*−1] *<sup>b</sup>*[*<sup>e</sup>* <sup>+</sup> *d/x*], where (*S, δ*) = *<sup>b</sup>smal l*(*x*).

In case (2), one can observe that the (in/dis-)equality literals in *b* will flip value at finitely many values of *x*, and so for *sufficiently small* values of *x*, *each* (in/dis-)equality literal in *b* will have a constant value (e.g., *y>x* will be true). By replacing these literals with these constant values, one obtains a new expression *b* equal to *b* for small enough *x*, and which depends on *x* only via (non-)divisibility constraints. The value of *b* will therefore actually be determined by *x* mod *δ*, where *δ* is the lowest-common-multiple of the (non-)divisibility constraints. We use *<sup>b</sup>*−∞(*x*) to denote the function handling this computation. Then, <sup>∃</sup>*x.b* can be reduced to - *<sup>d</sup>*∈[0*,δ*−1] *<sup>b</sup>* [*d/x*], where (*b , δ*) = *<sup>b</sup>*−∞(*x*).

In principle, the maximum of a function *y* = max*<sup>x</sup> f*(*x*) can be defined using two first-order quantifiers <sup>∀</sup>*x.f*(*x*) <sup>≤</sup> *<sup>y</sup>* and <sup>∃</sup>*x.f*(*x*) = *<sup>y</sup>*. One might therefore be tempted to tackle our maximum elimination problem using quantifier elimination directly. We explored this possibility and found two serious drawbacks. First, the resulting formula does not yield a permission-typed expression that we can plug back into our analysis. Second, the resulting formulas are extremely large (e.g., for the copyEven example it yields several pages of specifications), and hard to simplify since relevant information is often spread across many terms due to the two separate quantifiers. Our maximum elimination algorithm addresses these drawbacks by natively working with arithmetic expression, while mimicking the basic ideas of Cooper's algorithm and incorporating domain-specific optimisations.

#### **5.2 Maximum Elimination**

The first step is to reduce the problem of eliminating general max*<sup>x</sup>*|*<sup>b</sup>* (*p*) terms to those in which *b* and *p* come from a simpler restricted grammar. These *simple permission expressions p* do not contain general conditional expressions (*b* ? *p*<sup>1</sup> : *p*2), but instead only those of the form (*b* ?*r* :0) (where *r* is a constant or rd). Furthermore, simple permission expressions only contain subtractions of the form *<sup>p</sup>* <sup>−</sup> (*b* ?*r*:0). This is achieved in a precursory rewriting of the input expression by, for instance, distributing pointwise maxima over conditional expressions and binary maxima. For example, the pointwise maximum term (part of the copyEven example): max<sup>j</sup>|0≤j*<*len(a) ((j%2=0 ? (*qa*=<sup>a</sup> <sup>∧</sup> *<sup>q</sup>i*=<sup>j</sup> ? rd : 0) : (*qa*=<sup>a</sup> <sup>∧</sup> *<sup>q</sup>i*=<sup>j</sup> ? 1 : 0))) will be reduced to:

$$\max\left(\max\_{\mathbf{j}\mid 0 \le \mathbf{j} < \mathbf{1} \bullet \mathbf{n}(\mathbf{a}) \land \mathbf{j}\% 2 = 0} \left( (q\_a = \mathbf{a} \land q\_i = \mathbf{j} \text{ ? rd}: 0) \right), \mathbf{j}\right)$$

$$\max\_{\mathbf{j}\mid 0 \le \mathbf{j} < \mathbf{1} \bullet \mathbf{n}(\mathbf{a}) \land \mathbf{j}\% 2 \ne 0} \left( (q\_a = \mathbf{a} \land q\_i = \mathbf{j} \text{ ? } 1 : 0) \right))$$

**Fig. 5.** Filtered boundary expression computation.

**Arbitrarily-Small Values.** We exploit a high-level case-split in our algorithm design analogous to Cooper's: given a pointwise maximum expression max*<sup>x</sup>*|*<sup>b</sup>* (*p*), either a *smallest* value of *x* exists such that *p* has its maximal value (and *b* is true), or there are *arbitrarily small* values of *x* defining this maximal value. To handle the latter case, we define a completely analogous *<sup>p</sup>*−∞(*x*) function, which recursively replaces all boolean expressions *<sup>b</sup>* in *<sup>p</sup>* with *b* −∞(*x*) as computed by Cooper; we relegate the definition to Sect. B.3 of the TR [15]. We then use (*b* ? *p* : 0), where (*b , δ*1) = *<sup>b</sup>*−∞(*x*) and (*p , δ*2) = *<sup>p</sup>*−∞(*x*), as our expression in this case. Note that this expression still depends on *x* if it contains (non-)divisibility constraints; Theorem 4 shows how *x* can be eliminated using *δ*<sup>1</sup> and *δ*2.

**Selecting Boundary Expressions for Maximum Elimination.** Next, we consider the case of selecting an appropriate set of boundary expressions, given a max*<sup>x</sup>*|*<sup>b</sup>* (*p*) term. We define this first for *<sup>p</sup>* in isolation, and then give an extended definition accounting for the *b*. Just as for Cooper's algorithm, the boundary expressions must be a set guaranteed to include the *smallest* value of *x* defining the maximum value in question. The set must be finite, and be as small as possible for efficiency of our overall algorithm. We refine the notion of boundary expression, and compute a set of *pairs* (*e, b* ) of integer expression *e* and its *filter condition b* : the filter condition represents an additional condition under which *e* must be included as a boundary expression. In particular, in contexts where *b* is false, *e* can be ignored; this gives us a way to symbolically define an ultimately-smaller set of boundary expressions, particularly in the absence of contextual information which might later show *b* to be false. We call these pairs *filtered boundary expressions*.

**Definition 1 (Filtered Boundary Expressions).** *The* filtered boundary expression computation for *<sup>x</sup>* in *<sup>p</sup>, written <sup>p</sup>smal lmax*(*x*)*, returns a pair of a set T of pairs* (*e, b* )*, and an integer constant δ, as defined in Fig. 5. This definition* *is also overloaded with a definition of* filtered boundary expression computation for (*<sup>x</sup>* <sup>|</sup> *<sup>b</sup>*) *in <sup>p</sup>, written* (*p, b*) *smal lmax*(*x*)*.*

Just as for Cooper's *<sup>b</sup>smal l*(*x*) computation, our function *<sup>p</sup>smal lmax*(*x*) computes the set *T* of (*e, b* ) pairs along with a single integer constant *δ*, which is the least common multiple of the divisors occurring in *p*; the desired smallest value of *<sup>x</sup>* may actually be some *<sup>e</sup>* <sup>+</sup> *<sup>d</sup>* where *<sup>d</sup>* <sup>∈</sup> [0*, δ* <sup>−</sup> 1]. There are three key points to Definition 1 which ultimately make our algorithm efficient:

First, the case for (*<sup>b</sup>* ? *<sup>p</sup>* : 0)*smal lmax*(*x*) only includes boundary expressions for making *b true*. The case of *b* being false (from the structure of the permission expression) is not relevant for trying to maximise the permission expression's value (note that this case will never apply under a subtraction operator, due to our simplified grammar, and the case for subtraction not recursing into the right-hand operand).

Second, the case for *<sup>p</sup>*<sup>1</sup> <sup>−</sup> (*<sup>b</sup>* ? *<sup>p</sup>* : 0)*smal lmax*(*x*) dually only considers boundary expressions for making *b false* (along with the boundary expressions for maximising *p*1). The filter condition *p*<sup>1</sup> *>* 0 is used to drop the boundary expressions for making *b* false; in case *p*<sup>1</sup> is not strictly positive we know that the evaluation of the whole permission expression will not yield a strictly-positive value, and hence is not an interesting boundary value for a non-negative maximum.

Third, in the overloaded definition of (*p, b*)*smal lmax*(*x*), we combine boundary expressions for *p* with those for *b*. The boundary expressions for *b* are, however, superfluous *if*, in analysing *p* we have already determined a value for *x* which maximises *p* and happens to satisfy *b*. If all boundary expressions for *p* (whose filter conditions are true) make *b* true, *and* all non-trivial (i.e. strictly positive) evaluations of *<sup>p</sup>*−∞(*x*) used for potentially defining *<sup>p</sup>*'s maximum value also satisfy *b*, then we can safely discard the boundary expressions for *b*.

We are now ready to reduce pointwise maximum expressions to equivalent maximum expressions over finitely-many cases:

**Theorem 4 (Simple Maximum Expression Elimination).** *For any pair* (*p, b*)*, if* <sup>|</sup><sup>=</sup> *<sup>p</sup>* <sup>≥</sup> <sup>0</sup>*, then we have:*

$$\vdash \max\_{x \mid b} p = \max \left( \max\_{\substack{(e, b'') \in T \\ d \in [0, \delta - 1]}} (b'' \land b[e + d/x] \, ? \, p[e + d/x] \, : \, 0) \right),$$

$$\max\_{d \in [0, lcm(\delta\_1, \delta\_2) - 1]} (b'[d/x] \, ? \, p'[d/x] \, : \, 0)$$

*where* (*T,δ*) = (*p, b*)*smal lmax*(*x*)*,* (*b , δ*1) = *<sup>b</sup>*−∞(*x*) *and* (*p , δ*2) = *<sup>p</sup>*−∞(*x*)*.*

To see how our filter conditions help to keep the set *T* (and therefore, the first iterated maximum on the right of the equality in the above theorem) small, consider the example: max*<sup>x</sup>*|*x*≥<sup>0</sup> ((*x*=*<sup>i</sup>* ? 1 : 0)) (so *<sup>p</sup>* is (*x*=*<sup>i</sup>* ? 1 : 0), while *<sup>b</sup>* is *<sup>x</sup>* <sup>≥</sup> 0). In this case, evaluating (*p, b*)*smal lmax*(*x*) yields the set *<sup>T</sup>* <sup>=</sup> {(*i,*true)*,*(0*,i <* 0)} with the meaning that the boundary expression *<sup>i</sup>* is considered in all cases, while the boundary expression 0 is only of interest if *i <* 0. The first iterated maximum term would be max((true∧*i*≥0?(*i*=*<sup>i</sup>* ? 1: 0): 0)*,*(*i<*<sup>0</sup> ∧0≥0?(0=*i*?1:0):0)). We observe that the term corresponding to the boundary



value 0 can be simplified to 0 since it contains the two contradictory conditions *i <* 0 and 0 = *<sup>i</sup>*. Thus, the entire maximum can be simplified to (*i*≥0 ? 1 : 0). Without the filter conditions the result would instead be max((*i*≥0 ? 1 : 0)*,* (0=*i* ? 1 : 0)). In the context of our permission analysis, the filter conditions allow us to avoid generating boundary expressions corresponding e.g. to the integer loop invariants, provided that the expressions generated by analysing the permission expression in question already suffice. We employ aggressive syntactic simplification of the resulting expressions, in order to exploit these filter conditions to produce succinct final answers.

# **6 Implementation and Experimental Evaluation**

We have developed a prototype implementation of our permission inference. The tool is written in Scala and accepts programs written in the Viper language [38], which provides all the features needed for our purposes.

Given a Viper program, the tool first performs a forward numerical analysis to infer the over-approximate loop invariants needed for our handling of loops. The implementation is parametric in the numerical abstract domain used for the analysis; we currently support the abstract domains provided by the Apron library [24]. As we have yet to integrate the implementation of underapproximate invariants (e.g., [35]), we rely on user-provided invariants, or assume them to be false if none are provided. In a second step, our tool performs the inference and maximum elimination. Finally, it annotates the input program with the inferred specification.

We evaluated our implementation on 43 programs taken from various sources; included are all programs that do not contain strings from the array memory safety category of SV-COMP 2017, all programs from Dillig et al. [14] (except three examples involving arrays of arrays), loop parallelisation examples from VerCors [5], and a few programs that we crafted ourselves. We manually checked that our soundness condition holds for all considered programs. The parallel loop examples were encoded as two consecutive loops where the first one models the forking of one thread per loop iteration (by iteratively exhaling the permissions required for all loop iterations), and the second one models the joining of all these threads (by inhaling the permissions that are left after each loop iteration). For the numerical analysis we used the *polyhedra abstract domain* provided by Apron. The experiments were performed on a dual core machine with a 2.60 GHz Intel Core i7-6600U CPU, running Ubuntu 16.04.

An overview of the results is given in Table 1. For each program, we compared the size and precision of the inferred specification with respect to hand-written ones. The running times were measured by first running the analysis 50 times to warm up the JVM and then computing the average time needed over the next 100 runs. The results show that the inference is very efficient. The inferred specifications are concise for the vast majority of the examples. In 35 out of 48 cases, our inference inferred precise specifications. Most of the imprecisions are due to the inferred numerical loop invariants. In all cases, manually strengthening the invariants yields a precise specification. In one example, the source of imprecision is our abstraction of array-dependent conditions (see Sect. 4).

### **7 Related Work**

Much work is dedicated to the analysis of array programs, but most of it focuses on array content, whereas we infer permission specifications. The simplest approach consists of "smashing" all array elements into a single memory location [4]. This is generally quite imprecise, as only weak updates can be performed on the smashed array. A simple alternative is to consider array elements as distinct variables [4], which is feasible only when the length of the array is statically-known. More-advanced approaches perform syntax-based [18,22,25] or semantics-based [12,34] partitions of an array into symbolic segments. These require segments to be contiguous (with the exception of [34]), and do not easily generalise to

multidimensional arrays, unlike our approach. Gulwani et al. [20] propose an approach for inferring quantified invariants for arrays by lifting quantifier-free abstract domains. Their technique requires templates for the invariants.

Dillig et al. [14] avoid an explicit array partitioning by maintaining constraints that over- and under-approximate the array elements being updated by a program statement. Their work employs a technique for directly generalising the analysis of a single loop iteration (based on quantifier elimination), which works well when different loop iterations write to disjoint array locations. Gedell and Hähnle [17] provide an analysis which uses a similar criterion to determine that it is safe to parallelise a loop, and treat its heap updates as one bulk effect. The condition for our projection over loop iterations is weaker, since it allows the same array location to be updated in multiple loop iterations (like for example in sorting algorithms). Blom et al. [5] provide a specification technique for a variety of parallel loop constructs; our work can infer the specifications which their technique requires to be provided.

Another alternative for generalising the effect of a loop iteration is to use a first order theorem prover as proposed by Kovács and Voronkov [28]. In their work, however, they did not consider nested loops or multidimensional arrays. Other works rely on loop acceleration techniques [1,7]. In particular, like ours, the work of Bozga et al. [7] does not synthesise loop invariants; they directly infer post-conditions of loops with respect to given preconditions, while we additionally infer the preconditions. The acceleration technique proposed in [1] is used for the verification of array programs in the tool Booster [2].

Monniaux and Gonnord [36] describe an approach for the verification of array programs via a transformation to array-free Horn clauses. Chakraborty et al. [10] use heuristics to determine the array accesses performed by a loop iteration and split the verification of an array invariant accordingly. Their non-interference condition between loop iterations is similar to, but stronger than our soundness condition (cf. Sect. 4). Neither work is concerned with specification inference.

A wide range of static/shape analyses employ tailored separation logics as abstract domain (e.g., [3,9,19,29,41]); these works handle recursively-defined data structures such as linked lists and trees, but not random-access data structures such as arrays and matrices. Of these, Gulavani et al. [19] is perhaps closest to our work: they employ an integer-indexed domain for describing recursive data structures. It would be interesting to combine our work with such separation logic shape analyses. The problems of automating biabduction and entailment checking for array-based separation logics have been recently studied by Brotherston et al. [8] and Kimura and Tatsuta [27], but have not yet been extended to handle loop-based or recursive programs.

### **8 Conclusion and Future Work**

We presented a precise and efficient permission inference for array programs. Although our inferred specifications contain redundancies in some cases, they are

human readable. Our approach integrates well with permission-based inference for other data structures and with permission-based program verification.

As future work, we plan to use SMT solving to further simplify our inferred specifications, to support arrays of arrays, and to extend our work to an interprocedural analysis and explore its combination with biabduction techniques.

**Acknowledgements.** We thank Seraiah Walter for his earlier work on this topic, and Malte Schwerhoff and the anonymous reviewers for their comments and suggestions. This work was supported by the Swiss National Science Foundation.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Program Analysis Is Harder Than Verification: A Computability Perspective**

Patrick Cousot<sup>1</sup> , Roberto Giacobazzi2,3 , and Francesco Ranzato4(B)

 New York University, New York City, USA University of Verona, Verona, Italy IMDEA Software Institute, Madrid, Spain University of Padova, Padova, Italy ranzato@math.unipd.it

**Abstract.** We study from a computability perspective static program analysis, namely detecting sound program assertions, and verification, namely sound checking of program assertions. We first design a general computability model for domains of program assertions and corresponding program analysers and verifiers. Next, we formalize and prove an instantiation of Rice's theorem for static program analysis and verification. Then, within this general model, we provide and show a precise statement of the popular belief that program analysis is a harder problem than program verification: we prove that for finite domains of program assertions, program analysis and verification are equivalent problems, while for infinite domains, program analysis is strictly harder than verification.

# **1 Introduction**

It is common to assume that program analysis is harder than program verification (e.g. [1,17,22]). The intuition is that this happens because in program analysis we need to synthesize a correct program invariant while in program verification we have *just* to check whether a given program invariant is correct. The distinction between checking a proof and computing a witness for that proof can be traced back to Leibniz [18] in his *ars iudicandi* and *ars inveniendi*, respectively representing the analytic and synthetic method. In Leibniz's *ars combinatoria*, the ars inveniendi is defined as the art of discovering "correct" questions while ars iudicandi is defined as the art of discovering "correct" answers. These foundational aspects of mathematical reasoning have a peculiar meaning when dealing with questions and answers concerning the behaviour of computer programs as objects of our investigation.

Our main goal is to define a general and precise model for reasoning on the computability aspects of the notions of (sound or complete) static analyser and verifier for generic programs (viz. Turing machines). Both static analysers and verifiers assume a given domain A of abstract program assertions, that may range from synctatic program properties (e.g., program sizes or LOCs) to complexity properties (e.g., number of execution steps in some abstract machine) and all the semantic properties of the program behaviour (e.g., value range of program variables or shape of program memories). A program analyser is defined to be any total computable (i.e., total recursive) function that for any program P returns an assertion <sup>a</sup><sup>P</sup> in <sup>A</sup>, which is sound when the concrete meaning of the assertion <sup>a</sup><sup>P</sup> includes <sup>P</sup>. Instead, a program verifier is a (total) decision procedure which is capable of checking whether a given program P satisfies a given assertion a ranging in A, answering "true" or "don't know", which is sound when a positive check of a for P means that the concrete meaning of the assertion a includes P. Completeness, which coupled with soundness is here called precision, for a program analyser holds when, for any program P, it returns the strongest assertion in A for P, while a program verifier is called precise if it is able to prove any true assertion in A for a program P. This general and minimal model allows us to extend to static program analysis and verification some standard results and methods of computability theory. We provide an instance of the well-known Rice's Theorem [29] for generic analysers and verifiers, by proving that sound and precise analysers (resp. verifiers) exist only for trivial domains of assertions. This allows us to generalise known results about undecidability of program analysis, such as the undecidability of the meet over all paths (MOP) solution for monotone dataflow analysis frameworks [15], making them independent from the structure of the domain of assertions. Then, we define a model for comparing the relative "verification power" of program analysers and verifiers. In this model, a verifier <sup>V</sup> on a domain A of assertions is more precise than an analyser <sup>A</sup> on the same domain A when any assertion a in A which can be proved by <sup>A</sup> for a program P—this means that the output of the analyser <sup>A</sup>(P) is stronger than the assertion a—can be also proved by <sup>V</sup>. Conversely, <sup>A</sup> is more precise than <sup>V</sup> when any assertion a proved by <sup>V</sup> can be also proved by <sup>A</sup>. We prove that while it is always possible to constructively transform a program analyser into an equivalent verifier (i.e., with the same verification power), the converse does not hold in general. In fact, we first show that for *finite* domains of assertions, any "reasonable" verifier can be constructively transformed into an equivalent analyser, where reasonable means that the verifier V is: (i) nontrivial: for any program, V is capable to prove some assertion, possibly a trivially true assertion; (ii) monotone: if <sup>V</sup> proves an assertion a and a is stronger than a then <sup>V</sup> is also capable of proving a ; (iii) logically meet-closed: if <sup>V</sup> proves both <sup>a</sup><sup>1</sup> and <sup>a</sup><sup>2</sup> and the logical conjunction <sup>a</sup><sup>1</sup> <sup>∧</sup> <sup>a</sup><sup>2</sup> is a representable assertion then <sup>V</sup> is also capable of proving it. Next, we prove the following impossibility result: for any *infinite* abstract domain of assertions A, no constructive reduction from reasonable verifiers on A to equivalent analysers on A is possible. This provides, to the best of our knowledge, the first formalization of the common folklore that program analysis is harder than program verification.

# **2 Background**

We follow the standard terminology and notation for sets and computable functions in recursion theory (e.g., [12,26,30]). If X and Y are sets then X <sup>→</sup> Y

and X → Y denote, respectively, the set of all total and partial functions from X to Y . If f : X → Y then f(x)<sup>↓</sup> and f(x)<sup>↑</sup> mean that f is defined/undefined on x <sup>∈</sup> X. Hence dom(f) = {x <sup>∈</sup> X <sup>|</sup> f(x)↓ }. If S <sup>⊆</sup> Y then f(x) <sup>∈</sup> S denotes the implification f(x)↓ ⇒ f(x) <sup>∈</sup> S. If f,g : X → Y then f <sup>=</sup> g means that dom(f) = dom(g) and for any x <sup>∈</sup> dom(f) = dom(g), f(x) = g(x). The set of all partial (total) recursive functions on natural numbers is denoted by N <sup>r</sup> → <sup>N</sup> (N <sup>r</sup> <sup>→</sup> <sup>N</sup>). Recall that A <sup>⊆</sup> <sup>N</sup> is a recursively enumerable (r.e., or semidecidable) set if A = dom(f) for some f <sup>∈</sup> <sup>N</sup> <sup>r</sup> → <sup>N</sup>, while A <sup>⊆</sup> <sup>N</sup> is a recursive (or decidable) set if both A and its complement A¯ <sup>=</sup> N - A are recursively enumerable, and this happens when there exists f <sup>∈</sup> <sup>N</sup> <sup>r</sup> <sup>→</sup> <sup>N</sup> such that f <sup>=</sup> λn. n <sup>∈</sup> A **?** <sup>1</sup> **:** 0.

Let Prog denote some deterministic programming language which is Turing complete. More precisely, this means that for any partial recursive function f : N <sup>r</sup> → <sup>N</sup> there exists a program P <sup>∈</sup> Prog such that -<sup>P</sup> <sup>∼</sup><sup>=</sup> <sup>f</sup>, where -P : D → D is a denotational input/output semantics of P on a domain D of input/output values for Prog, where: undefinedness encodes nontermination and ∼= means equality up to some recursive encoding enc : D <sup>r</sup> <sup>→</sup> <sup>N</sup> and decoding dec : <sup>N</sup> <sup>r</sup> <sup>→</sup> D functions, i.e., f = enc ◦-P◦dec. We also assume a small-step transition relation ⇒ ⊆ (Prog <sup>×</sup>D) <sup>×</sup> ((Prog <sup>×</sup>D) <sup>∪</sup> D) for Prog defining an operational semantics which is functionally equivalent to the denotational semantics: P, i ⇒<sup>∗</sup> <sup>o</sup> iff -Pi <sup>=</sup> o. By an abuse of notation, we will identify the input/output semantics of a program P with the partial recursive function computed by P, i.e., we will consider programs P <sup>∈</sup> Prog whose input/output semantics is a partial recursive function -P : <sup>N</sup> <sup>r</sup> → <sup>N</sup>, so that, by Turing completeness, {-P : <sup>N</sup> <sup>r</sup> → <sup>N</sup> <sup>|</sup> P <sup>∈</sup> Prog} <sup>=</sup> <sup>N</sup> <sup>r</sup> → <sup>N</sup>.

### **3 Abstract Domains**

Static program analysis and verification are always defined with respect to a given (denumerable) domain of program assertions, that we call here *abstract domain* [7], where the meaning of assertions is formalized by a function which induces a logical implication relation between assertions.

**Definition 3.1 (Abstract Domain).** An *abstract domain* is a tuple A, γ, <sup>≤</sup>γ such that:


An abstract element a <sup>∈</sup> A such that γ(a) = Prog is called an *abstract top*, while a is called an *abstract bottom* when γ(a) = <sup>∅</sup>.

The elements of A are called assertions or abstract values, γ is called concretization function (this may also be a nonrecursive function, which is typical of abstract domains representing semantic program properties), and ≤<sup>γ</sup> is called the implication or approximation relation of A. Thus, in this general model, a program assertion a <sup>∈</sup> A plays the role of some abstract representation of any program property <sup>γ</sup>(a) <sup>∈</sup> <sup>℘</sup>(Prog), while the comparison relation <sup>a</sup><sup>1</sup> <sup>≤</sup><sup>γ</sup> <sup>a</sup><sup>2</sup> holds when <sup>a</sup><sup>1</sup> is a stronger (or more precise) property than <sup>a</sup><sup>2</sup>. Let us also observe that, as a limit case, Definition 3.1 allows an abstract domain to be empty, that is, the tuple <sup>∅</sup>, <sup>∅</sup>, <sup>∅</sup> satisfies the definition of abstract domain, where <sup>∅</sup> denotes both the empty set, the empty function (i.e., the unique subset of <sup>∅</sup> <sup>×</sup> <sup>∅</sup>) and the empty relation.

**Example 3.2.** Let us give some simple examples of abstract domains.


Definition 3.1 does not require injectivity of the concretization function γ, thus multiple assertions could have the same meaning. Two abstract values <sup>a</sup>1, a<sup>2</sup> <sup>∈</sup> <sup>A</sup> are called equivalent when <sup>γ</sup>(a<sup>1</sup>) = <sup>γ</sup>(a<sup>2</sup>). Let us observe that since <sup>≤</sup><sup>γ</sup> is required to be decidable, the equivalence <sup>γ</sup>(a<sup>1</sup>) = <sup>γ</sup>(a<sup>2</sup>) is decidable as well. For example, for the well-known numerical abstract domain of convex polyhedra [11] represented through linear constraints between program variables, we may well have multiple representations <sup>P</sup><sup>1</sup> and <sup>P</sup><sup>2</sup> for the same polyhedron, e.g., <sup>P</sup><sup>1</sup> <sup>=</sup> {<sup>x</sup> <sup>=</sup> z, z <sup>≤</sup> <sup>y</sup>} and <sup>P</sup><sup>2</sup> <sup>=</sup> {<sup>x</sup> <sup>=</sup> z, x <sup>≤</sup> <sup>y</sup>} both represent the same polyhedron. Thus, in general, an abstract domain A is not required to be partially ordered by ≤γ. On the other hand, the relation ≤<sup>γ</sup> is clearly a preorder on A. The only basic requirement is that for any pair of abstract values <sup>a</sup><sup>1</sup>, a<sup>2</sup> <sup>∈</sup> <sup>A</sup>, one can decide if <sup>a</sup><sup>1</sup> is a more precise program assertion than a<sup>2</sup>, i.e., if γ(a<sup>1</sup>) <sup>⊆</sup> γ(a<sup>2</sup>) holds. In this sense we do not require that a partial order <sup>≤</sup> is defined a priori on A and that γ is monotone w.r.t. <sup>≤</sup>, since for our purposes it is enough to consider the preorder <sup>≤</sup><sup>γ</sup> induced by <sup>γ</sup>. If instead <sup>A</sup> is endowed with a partial order <sup>≤</sup><sup>A</sup> and <sup>A</sup> is defined in abstract interpretation [7,8] through a Galois insertion based on the concretization map γ, then it turns out that <sup>γ</sup>(a<sup>1</sup>) <sup>⊆</sup> <sup>γ</sup>(a<sup>2</sup>) <sup>⇔</sup> <sup>a</sup><sup>1</sup> <sup>≤</sup><sup>A</sup> <sup>a</sup><sup>2</sup> holds, so that the decidability of the relation <sup>≤</sup><sup>γ</sup> <sup>=</sup> {(a<sup>1</sup>, a<sup>2</sup>) <sup>∈</sup> <sup>A</sup>×<sup>A</sup> <sup>|</sup> <sup>γ</sup>(a<sup>1</sup>) <sup>⊆</sup> <sup>γ</sup>(a<sup>2</sup>)} boils down to the decidability of the partial order relation ≤A. As an example, it is well known that the abstract domain of polyhedra does not admit a Galois insertion [11], nevertheless its induced preorder relation ≤<sup>γ</sup> is decidable: for example, for polyhedra represented by linear constraints, there exist algorithms for deciding if γ(P<sup>1</sup>) <sup>⊆</sup> <sup>γ</sup>(P<sup>2</sup>) for any pair of convex polyhedra representations <sup>P</sup><sup>1</sup> and <sup>P</sup><sup>2</sup> (see e.g. [23, Sect. 5.3]).

#### **3.1 Abstract Domains in Abstract Interpretation**

An abstract domain in standard abstract interpretation [7–9] is usually defined by a poset A, <sup>≤</sup>A containing a top element ∈ <sup>A</sup> and a concretization map <sup>γ</sup><sup>A</sup> : A <sup>→</sup> ℘(Dom), where Dom denotes some concrete semantic domain (e.g., program stores or program traces), such that: (a) A is machine representable, namely the abstract elements of A are encoded by some data structures (e.g., tuples, vectors, lists, matrices, etc.), and some algorithms are available for deciding if <sup>a</sup><sup>1</sup> <sup>≤</sup><sup>A</sup> <sup>a</sup><sup>2</sup> holds; (b) <sup>a</sup><sup>1</sup> <sup>≤</sup><sup>A</sup> <sup>a</sup><sup>2</sup> <sup>⇔</sup> <sup>γ</sup><sup>A</sup>(a<sup>1</sup>) <sup>⊆</sup> <sup>γ</sup><sup>A</sup>(a<sup>2</sup>) holds (this equivalence always holds for Galois insertions); (c) γ<sup>A</sup>() = Dom. Let us point out that Definition 3.1 is very general since the concretization of an abstract value can be any program property, possibly a purely syntactic property or some space or time complexity property, as in the simple cases of Example 3.2 (1)-(2)-(5).

Let <sup>γ</sup><sup>A</sup> : <sup>A</sup> <sup>→</sup> <sup>℘</sup>(Dom) and assume that Dom is defined by program stores, namely Dom - Var → Val, where Var is a finite set of program variables and Val is a corresponding denumerable set of values. Since Var → Val has a finite domain and a denumerable range, we can assume a recursive encoding of finite tuples of values into natural numbers <sup>N</sup>, i.e. Var <sup>→</sup> Val <sup>∼</sup><sup>=</sup> <sup>N</sup>, and define <sup>γ</sup><sup>A</sup> : <sup>A</sup> <sup>→</sup> <sup>℘</sup>(N). This is equivalent assuming that programs have one single variable, say x, which may assume tuples of values in Val. A set of numbers γ<sup>A</sup>(a) <sup>∈</sup> <sup>℘</sup>(N) is meant to represent a property of the values stored in the program variable x at the end of the program execution, that is, if the program terminates its execution then the variable x stores a value in γ<sup>A</sup>(a). Hence, as usual, the property <sup>∅</sup> <sup>∈</sup> <sup>℘</sup>(N) means that the program does not correctly terminate its execution either by true nontermination or by some run-time error, namely, that the exit program point is not reachable. For simplicity, we do not consider intermediate program points and assertions in our semantics.

For an abstract domain A, γ<sup>A</sup>, <sup>≤</sup>A in standard abstract interpretation, the corresponding concretization function γ : A <sup>→</sup> ℘(Prog) of Definition 3.1 is defined as:

$$\gamma(a) \triangleq \{ P \in \text{Prog } \mid \forall i \in \mathbb{N}. \text{ } [P](i) \in \gamma\_A(a) \}$$

where we recall that -<sup>P</sup>(i) <sup>∈</sup> <sup>γ</sup><sup>A</sup>(a) means -P(i) = o <sup>⇒</sup> o <sup>∈</sup> γ<sup>A</sup>(a). Hence, if <sup>A</sup> contains top <sup>A</sup> and bottom <sup>⊥</sup><sup>A</sup> such that <sup>γ</sup><sup>A</sup>(A) = <sup>N</sup> and <sup>γ</sup><sup>A</sup>(⊥A) = <sup>∅</sup> then γ(A) = Prog and <sup>γ</sup>(⊥A) = {<sup>P</sup> <sup>∈</sup> Prog <sup>|</sup> <sup>P</sup> never terminates}. Moreover, since <sup>γ</sup><sup>A</sup> is monotonic, we have that <sup>γ</sup> is monotonic as well. The fact that all the elements in A are machine representable boils down to the requirement that <sup>A</sup> is a recursive set, while the binary preorder relation <sup>≤</sup><sup>γ</sup> is decidable because <sup>a</sup><sup>1</sup> <sup>≤</sup><sup>A</sup> <sup>a</sup><sup>2</sup> <sup>⇔</sup> <sup>γ</sup>(a<sup>1</sup>) <sup>⊆</sup> <sup>γ</sup>(a<sup>2</sup>) holds and <sup>≤</sup><sup>A</sup> is decidable. This therefore defines an abstract domain according to Definition 3.1.

In this simple view of the abstract domain A, there is no input property for the variable x, meaning that at the beginning x may store any value. It is easy to generalize the above definition by requiring an input abstract property in A for x, so that the abstract domain is a Cartesian product A <sup>×</sup> A together with a concretization γi/o : <sup>A</sup> <sup>×</sup> <sup>A</sup> <sup>→</sup> <sup>℘</sup>(Prog) defined as follows:

$$\gamma^{i\nmid o}(\langle a\_i, a\_o \rangle) \triangleq \{ P \in \text{Prog } \mid \forall i \in \mathbb{N}. i \in \gamma\_A(a\_i) \Rightarrow \|P\|(i) \in \gamma\_A(a\_o) \}.$$

This is a generalization since, for any a <sup>∈</sup> A, we have that γ(a) = γi/o( A, a).

**Example 3.3 (Interval Abstract Domain).** Let Int be the standard interval domain [7] restricted to natural numbers in N, endowed with the standard subset ordering:

$$\text{Int} \triangleq \{ [a, b] \mid a, b \in \mathbb{N}, \ a \le b \} \cup \{ \bot\_{\text{Int}} \} \cup \{ [a, +\infty) \mid a \in \mathbb{N} \}$$

with concretization <sup>γ</sup>Int : Int <sup>→</sup> <sup>℘</sup>(N), where <sup>γ</sup>Int(⊥Int) = <sup>∅</sup>, <sup>γ</sup>Int([a, b]) = [a, b] and γInt([0, <sup>+</sup>∞)) = <sup>N</sup>, so that [0, <sup>+</sup>∞) is also denoted by Int. Thus, here, for the concretization function γ : Int <sup>→</sup> ℘(Prog) we have that: γ(Int) = Prog, γ(⊥Int) = {<sup>P</sup> <sup>∈</sup> Prog | ∀i. -P(i)↑ }, γ([a, <sup>+</sup>∞)) = {P <sup>∈</sup> Prog | ∀i <sup>∈</sup> N. -P(i)↓ ⇒ -P(i) <sup>≥</sup> a}. We also have the input/output concretization γi/o : Int <sup>×</sup> Int <sup>→</sup> ℘(Prog), where

$$\gamma^{i\backslash o}(\langle I, J\rangle) \stackrel{\Delta}{=} \{ P \in \text{Prog } \mid \forall i \in \mathbb{N}. i \in \gamma\_{\text{Int}}(I) \Rightarrow \lbrack P\rrbracket(i) \in \gamma\_{\text{Int}}(J)\}. \qquad \square$$

### **4 Program Analysers and Verifiers**

In our model, the notions of program analyser and verifier are as general as possible.

**Definition 4.1 (Program Analyser).** Given an abstract domain A, γ, <sup>≤</sup>γ, <sup>a</sup> *program analyser* on A is any total recursive function <sup>A</sup> : Prog <sup>→</sup> A. The set of analysers on a given abstract domain A will be denoted by <sup>A</sup>A. An analyser A ∈ <sup>A</sup><sup>A</sup> is *sound* if for any <sup>P</sup> <sup>∈</sup> Prog and <sup>a</sup> <sup>∈</sup> <sup>A</sup>,

$$\mathcal{A}(P) \leq\_{\gamma} a \Rightarrow P \in \gamma(a)$$

while A is *precise* if it is also complete, i.e., if the reverse implication also holds:

$$P \in \gamma(a) \Rightarrow \mathcal{A}(P) \leq\_{\gamma} a. \tag{7}$$

Notice that this definition of soundness is equivalent to the standard notion of sound static analysis, namely, for any program P, <sup>A</sup>(P) always outputs a program assertion which is satisfied by P, i.e., P <sup>∈</sup> γ(A(P)). Let us also note that on the empty abstract domain ∅, no analyser can be defined simply because there exists no function in Prog <sup>→</sup> <sup>∅</sup>. Instead, for a singleton abstract domain <sup>A</sup>• - {•}, if A ∈ <sup>A</sup><sup>A</sup>• is sound then <sup>γ</sup>(•) = Prog, so that • is necessarily an abstract top. Also, if the abstract domain A contains a top abstract value <sup>A</sup> <sup>∈</sup> <sup>A</sup> then, as expected, λP.<sup>A</sup> is a trivially sound analyser on <sup>A</sup>. Finally, we observe that if A<sup>1</sup> and A<sup>2</sup> are both precise on the same abstract domain then we have A<sup>1</sup> =<sup>γ</sup> A2, meaning that A<sup>1</sup> and A<sup>2</sup> coincide up to equivalent abstract values, i.e., <sup>γ</sup>◦A<sup>1</sup> <sup>=</sup> <sup>γ</sup>◦A2. In fact, for any <sup>P</sup> <sup>∈</sup> Prog, we have that <sup>P</sup> <sup>∈</sup> <sup>γ</sup>(A2(P)) implies γ(A1(P)) <sup>⊆</sup> γ(A2(P)) and P <sup>∈</sup> γ(A1(P)) implies γ(A2(P)) <sup>⊆</sup> γ(A1(P)), so that A<sup>1</sup> =<sup>γ</sup> A2.

**Example 4.2.** Software metrics static analysers [35] deal with nonsemantic program properties, such as the domain in Example 3.2 (1). Bounded model checking [4,34] handles program properties such as those encoded by the domains of Example 3.2 (2)-(3). Complexity bound analysers such as [32,36] cope with domains of properties such as those in Example 3.2 (4)-(5). Numerical abstract domains used in program analysis (see [23]) include the interval abstraction described in Example 3.3.

**Definition 4.3 (Program Verifier).** Given an abstract domain A, γ, <sup>≤</sup>γ, a *program verifier* on A is any total recursive function <sup>V</sup> : Prog <sup>×</sup>A → {**t**, **?**}. The set of verifiers on a given abstract domain A will be denoted by <sup>V</sup>A. A verifier V ∈ <sup>V</sup><sup>A</sup> is *sound* if for any <sup>P</sup> <sup>∈</sup> Prog and <sup>a</sup> <sup>∈</sup> <sup>A</sup>,

$$\mathcal{V}(P, a) = \mathbf{t} \implies P \in \gamma(a)$$

while V is *precise* if it is also complete, i.e., if the reverse implication also holds:

$$P \in \gamma(a) \Rightarrow \mathcal{V}(P, a) = \mathbf{t}.$$

A verifier V ∈ <sup>V</sup><sup>A</sup> is *nontrivial* if for any program there exists at least one assertion which <sup>V</sup> is able to prove, i.e., for any P <sup>∈</sup> Prog there exists some a <sup>∈</sup> A such that <sup>V</sup>(P, a) = **<sup>t</sup>**. Also, a verifier is defined to be *trivial* when it is not nontrivial.

A verifier V ∈ <sup>V</sup><sup>A</sup> is *monotone* when the verification algorithm is monotone w.r.t. <sup>≤</sup>γ, i.e., (V(P, a) = **<sup>t</sup>** <sup>∧</sup> <sup>a</sup> <sup>≤</sup><sup>γ</sup> <sup>a</sup> ) ⇒ V(P, a ) = **t**.

**Remark 4.4.** Let us observe some straight consequences of Definition 4.3.

(1) Notice that for all nonempty abstract domains A, λ(P, a). **?** is a legal and vacuously sound verifier. Also, if A <sup>=</sup> <sup>∅</sup> is the empty abstract domain then the empty verifier <sup>V</sup> : Prog <sup>×</sup><sup>∅</sup> → {**t**, **?**} (namely, the function with empty graph) is trivially precise.

(2) Let us observe that if V is nontrivial and monotone then V is able to prove any abstract top: in fact, if ∈ A and γ() = Prog then, for any P <sup>∈</sup> Prog, since there exists some <sup>a</sup> <sup>∈</sup> <sup>A</sup> such that <sup>V</sup>(P, a) = **<sup>t</sup>** and <sup>a</sup> <sup>≤</sup><sup>γ</sup> , then, by monotonicity, <sup>V</sup>(P, ) = **<sup>t</sup>**.

(3) Note that if a verifier <sup>V</sup> is precise then <sup>V</sup>(P, a) = **?** <sup>⇔</sup> P ∈ γ(a), so that in this case an output <sup>V</sup>(P, a) = **?** always means that P does not satisfy the property a.

(4) Finally, if <sup>V</sup><sup>1</sup> and <sup>V</sup><sup>2</sup> are precise on the same abstract domain then <sup>V</sup>1(P, a) = **<sup>t</sup>** <sup>⇔</sup> <sup>P</sup> <sup>∈</sup> <sup>γ</sup>(a) ⇔ V2(P, a) = **<sup>t</sup>**, so that <sup>V</sup><sup>1</sup> <sup>=</sup> <sup>V</sup>2. **Example 4.5.** Program verifiers abund in literature, e.g., [3,21,27]. For example, [13] aims at complexity verification on domains like that in Example 3.2 (5) while reachability verifiers like [33] can check numerical properties of program variables such as those of Example 3.3.

# **5 Rice's Theorem for Static Program Analysis and Verification**

Classical Rice's Theorem in computability theory [26,29,30] states that an extensional property <sup>Π</sup> <sup>⊆</sup> <sup>N</sup> of an effective numbering {ϕ<sup>n</sup> <sup>|</sup> <sup>n</sup> <sup>∈</sup> <sup>N</sup>} <sup>=</sup> <sup>N</sup> <sup>r</sup> → <sup>N</sup> of partial recursive functions is a recursive set if and only if Π <sup>=</sup> <sup>∅</sup> or Π <sup>=</sup> <sup>N</sup>, i.e., <sup>Π</sup> is trivial. Let us recall that <sup>Π</sup> <sup>⊆</sup> <sup>N</sup> is extensional when <sup>ϕ</sup><sup>n</sup> <sup>=</sup> <sup>ϕ</sup><sup>m</sup> implies n <sup>∈</sup> Π <sup>⇔</sup> m <sup>∈</sup> Π. When dealing with program properties rather than indices of partial recursive functions, i.e., when Π <sup>⊆</sup> Prog, Rice's Theorem states that any nontrivial semantic program property is undecidable (see [28] for a statement of Rice's Theorem tailored for program properties). It is worth recalling that Rice's Theorem has been extended by Asperti [2] through an interesting generalization to so-called "complexity cliques", namely nonextensional program properties which may take into account the space or time complexity of programs: for example, the abstract domain of Example 3.2 (5) is not extensional but when logically "intersected" with an extensional domain (i.e., it is a product domain <sup>A</sup><sup>1</sup> <sup>×</sup> <sup>A</sup><sup>2</sup> where the concretization function is the set intersection λ a1, a<sup>2</sup>.γ<sup>1</sup>(a<sup>1</sup>) <sup>∩</sup> γ<sup>2</sup>(a<sup>2</sup>)) falls into this generalized version of Rice's Theorem.

In the following, we provide an instantiation of Rice's Theorem to sound static program analysis and verification by introducing a notion of extensionality for abstract domains. Abstract domains commonly used in abstract interpretation turn out to be extensional, when they are used for approximating the input/output behaviour of programs. For example, if a sound abstract interpretation of a program P in the interval abstract domain computes as abstract output a program assertion such as x <sup>∈</sup> [1, 5] and y <sup>∈</sup> [2, <sup>+</sup>∞) then this assertion is a sound abstract output for any other program Q having the same input/output behaviour of P.

**Definition 5.1 (Extensional Abstract Domain).** An abstract domain A, γ, <sup>≤</sup>γ is *extensional* when for any <sup>a</sup> <sup>∈</sup> <sup>A</sup>, <sup>γ</sup>(a) <sup>⊆</sup> Prog is an extensional program property, namely, if -P <sup>=</sup> -Q then P <sup>∈</sup> γ(a) <sup>⇔</sup> Q <sup>∈</sup> γ(a).

As usual, the intuition is that an extensional program property depends exclusively on the input/output program semantics -·. As a simple example, the domains of Example 3.2 (3)-(4) are extensional while the domains of Example 3.2 (1)-(2)-(5) are not.

**Definition 5.2 (Trivial Abstract Domain).** An abstract domain A, γ, <sup>≤</sup>γ is *trivial* when A contains abstract bottom or top elements only, i.e., for any a <sup>∈</sup> A, γ(a) ∈ {∅,Prog}.

Definition 5.2 allows 4 possible types for a trivial abstract domain A: (1) A <sup>=</sup> <sup>∅</sup>; (2) A is nonempty and consists of bottom elements only, i.e., A <sup>=</sup> <sup>∅</sup> and for all a <sup>∈</sup> A, γ(a) = <sup>∅</sup>; (3) A is nonempty and consists of top elements only, i.e., A <sup>=</sup> <sup>∅</sup> and for all a <sup>∈</sup> A, γ(a) = Prog; (4) A satisfies (2) and (3), i.e., A contains both bottom and top elements.

**Theorem 5.3 (Rice's Theorem for Program Analysis).** *Let* A, γ, <sup>≤</sup>γ *be an extensional abstract domain and let* A ∈ <sup>A</sup><sup>A</sup> *be a sound analyser. Then,* <sup>A</sup> *is precise iff* A *is trivial.*

*Proof.* Since we assume the existence of a sound analyser A ∈ <sup>A</sup><sup>A</sup> on the extensional abstract domain A, observe that necessarily A <sup>=</sup> <sup>∅</sup>.

Assume that A is trivial. We have to show that for any a <sup>∈</sup> A and P <sup>∈</sup> Prog, <sup>A</sup>(P) <sup>≤</sup><sup>γ</sup> <sup>a</sup> <sup>⇔</sup> <sup>P</sup> <sup>∈</sup> <sup>γ</sup>(a). Assume that <sup>P</sup> <sup>∈</sup> <sup>γ</sup>(a) for some <sup>a</sup> <sup>∈</sup> <sup>A</sup>. Then, we have that γ(a) <sup>=</sup> <sup>∅</sup>, so that, since A is trivial, it must necessarily be that γ(a) = Prog. By soundness of <sup>A</sup>, P <sup>∈</sup> γ(A(P)), so that, since A is trivial, γ(A(P)) = Prog. Hence, we have that <sup>γ</sup>(A(P)) = <sup>γ</sup>(a), thus implying <sup>A</sup>(P) <sup>≤</sup><sup>γ</sup> <sup>a</sup>. On the other hand, if <sup>A</sup>(P) <sup>≤</sup><sup>γ</sup> <sup>a</sup> then <sup>γ</sup>(A(P)) <sup>⊆</sup> <sup>γ</sup>(a), so that, since, by soundness of <sup>A</sup>, P <sup>∈</sup> γ(A(P)), we also have that P <sup>∈</sup> γ(a).

Conversely, assume now that <sup>A</sup> is precise, namely, <sup>P</sup> <sup>∈</sup> <sup>γ</sup>(a) iff <sup>A</sup>(P) <sup>≤</sup><sup>γ</sup> <sup>a</sup>. Thus, since A is a total recursive function and ≤<sup>γ</sup> is decidable, we have that, for any a <sup>∈</sup> A, P <sup>∈</sup>? <sup>γ</sup>(a) is decidable. Since <sup>γ</sup>(a) is an extensional program property, by Rice's Theorem, γ(a) must necessarily be trivial, i.e., γ(a) ∈ {∅,Prog}. This means that the abstract domain A is trivial.

Rice's Theorem for program analysis can be applied to several abstract domains. Due to lack of space, we just mention that the well-known undecidability of computing the meet over all paths (MOP) solution for a monotone dataflow analysis problem, proved by Kam and Ullman [15, Sect. 6] by resorting to undecidability of Post's Correspondence Problem, can be derived as a simple consequence of Theorem 5.3.

Along the same lines of Theorem 5.3, Rice's Theorem can be instantiated to program verification as follows.

**Theorem 5.4 (Rice's Theorem for Program Verification).** *Let* A, γ, <sup>≤</sup>γ *be an extensional abstract domain and let* V ∈ <sup>V</sup><sup>A</sup> *be a sound, nontrivial and monotone verifier. Then,* <sup>V</sup> *is precise iff* A *is trivial.*

*Proof.* Let <sup>A</sup> be an extensional abstract domain and V ∈ <sup>V</sup><sup>A</sup> be sound and nontrivial. If A <sup>=</sup> <sup>∅</sup> then A is trivial while the only possible verifier <sup>V</sup> : Prog <sup>×</sup><sup>∅</sup> <sup>→</sup> {**t**, **?**} is the empty verifier, which is vacuously precise but it is not nontrivial. Thus, A <sup>=</sup> <sup>∅</sup> holds.

Assume that <sup>V</sup> is precise, that is, P <sup>∈</sup> γ(a) iff <sup>V</sup>(P, a) = **<sup>t</sup>**. Hence, since <sup>V</sup> is a total recursive function, <sup>V</sup>(P, a) =? **<sup>t</sup>** is decidable, so that <sup>P</sup> <sup>∈</sup>? <sup>γ</sup>(a) is decidable as well. As in the proof of Theorem 5.3, since γ(a) is an extensional program property, by Rice's Theorem, γ(a) ∈ {∅,Prog}. Thus, the abstract domain A is trivial.

Conversely, let A <sup>=</sup> <sup>∅</sup> be a trivial abstract domain. We have to prove that for any a <sup>∈</sup> A and P <sup>∈</sup> Prog, <sup>V</sup>(P, a) = **<sup>t</sup>** <sup>⇔</sup> P <sup>∈</sup> γ(a). Consider any a <sup>∈</sup> A. Since A is trivial, γ(a) ∈ {∅,Prog}. If γ(a) = <sup>∅</sup> then, by soundness of <sup>V</sup>, for any P <sup>∈</sup> Prog, <sup>V</sup>(P, a) = **?**, so that <sup>V</sup>(P, a) = **<sup>t</sup>** <sup>⇔</sup> P <sup>∈</sup> γ(a) holds. If, instead, γ(a) = Prog, i.e. a is an abstract top, then, since <sup>V</sup> is assumed to be nontrivial and monotone, by Remark 4.4 (2), <sup>V</sup> is able to prove the abstract top a for any program, namely, for any P <sup>∈</sup> Prog, <sup>V</sup>(P, a) = **<sup>t</sup>**, so that <sup>V</sup>(P, a) = **<sup>t</sup>** <sup>⇔</sup> <sup>P</sup> <sup>∈</sup> <sup>γ</sup>(a) holds.

Let us remark a noteworthy difference of Theorem 5.4 w.r.t. Rice's theorem for static analysis. Let us consider a trivial abstract domain A - {} with γ() = Prog. Here, the trivially sound analyser λP. is also precise, in accordance with Theorem 5.3. Instead, the trivially sound verifier V**?** λ(P, a).**?** is not precise, because P <sup>∈</sup> γ() ⇔ V**?**(P, ) = **<sup>t</sup>** does not hold. The point here is that V**?** lacks the property of being nontrivial, and therefore Theorem 5.4 cannot be applied. On the other hand, V**<sup>t</sup>** λ(P, a).**<sup>t</sup>** is nontrivial and precise, because, in this case, P <sup>∈</sup> γ() ⇔ V**t**(P, ) = **<sup>t</sup>** holds. Similarly, if we consider the trivial abstract domain A - {, }, with γ() = Prog = γ( ), then the verifier

$$\mathcal{V}'(P,a) \triangleq \begin{cases} \mathbf{t} & \text{if } a = \top \\ \mathbf{?} & \text{if } a = \top \end{cases}$$

is sound and nontrivial, but still <sup>V</sup> is not precise, because P <sup>∈</sup> γ( ) ⇔ V (P, ) = **t** does not hold. The point here is that V is not monotone, because V (P, ) = **<sup>t</sup>** and ≤<sup>γ</sup> but <sup>V</sup> (P, ) = **t**, so that Theorem 5.4 cannot be applied.

# **6 Comparing Analysers and Verifiers**

Let us now focus on a model for comparing the relative precision of program analysers and verifiers w.r.t. a common abstract domain A, γ, <sup>≤</sup>γ.

**Definition 6.1 (Comparison Relations).** *Let* <sup>V</sup>, <sup>V</sup> <sup>∈</sup> <sup>V</sup><sup>A</sup>, <sup>A</sup>, <sup>A</sup> <sup>∈</sup> <sup>A</sup><sup>A</sup>, *and* <sup>X</sup> ,Y ∈ <sup>V</sup><sup>A</sup> <sup>∪</sup> <sup>A</sup>A.

$$\begin{array}{ll} (1) \;\;\mathcal{V} \sqsubseteq \mathcal{V} \;\;\; iff \;\;\forall P \in \text{Prog } \forall a \in A. \;\;\mathcal{V}(P, a) = \mathbf{t} \Rightarrow \;\mathcal{V}(P, a) = \mathbf{t} \\ (2) \;\;\mathcal{A} \sqsubseteq \mathcal{A} \;\;\; iff \;\;\forall P \in \text{Prog } \;\mathcal{A}(P) \leq\_{\gamma} \mathcal{A}'(P) \\ (3) \;\;\;\mathcal{V} \sqsubseteq \mathcal{A} \;\;\; iff \;\;\;\forall P \in \text{Prog } \forall a \in A. \;\;\mathcal{A}(P) \leq\_{\gamma} a \Rightarrow \;\mathcal{V}(P, a) = \mathbf{t} \\ (4) \;\;\;\mathcal{A} \sqsubseteq \mathcal{V} \;\;\; iff \;\;\;\mathcal{V}P \in \text{Prog } \forall a \in A. \;\;\mathcal{V}(P, a) = \mathbf{t} \Rightarrow \;\mathcal{A}(P) \leq\_{\gamma} a \\ (5) \;\;\;\mathcal{X} \sqsubseteq \mathcal{Y} \;\;\; when \;\;\;\mathcal{X} \sqsubseteq \mathcal{Y} \;\; and \;\;\;\mathcal{Y} \sqsubseteq \mathcal{X} \end{array}$$

Let us comment on the previous definitions, which intuitively take into account the relative "verification powers" of verifiers and analysers. The relation VV holds when every assertion proved by V can be also proved by V, while AA means that the output assertion provided by A is more precise than that produced by A . Also, a verifier V is more precise than an analyser A when the verification power of V is not less than the verification power of <sup>A</sup>, namely, any assertion a which can be proved by <sup>A</sup> for a program P, i.e. <sup>A</sup>(P) <sup>≤</sup><sup>γ</sup> <sup>a</sup> holds, can be also proved by <sup>V</sup>. Likewise, <sup>A</sup> is more precise than <sup>V</sup> when any assertion a proved by <sup>V</sup> can be also proved by <sup>A</sup>, i.e., <sup>V</sup>(P, a) = **<sup>t</sup>** implies <sup>A</sup>(P) <sup>≤</sup><sup>γ</sup> <sup>a</sup>.

Let us observe that <sup>V</sup><sup>A</sup>, turns out to be a poset, while <sup>A</sup><sup>A</sup>, is just a preordered set (cf. the lattice of abstract interpretations in [8]). We have that <sup>V</sup><sup>A</sup>, has a greatest element <sup>V</sup>**?** λ(P, a).**?**, which, in particular, is always sound although it is trivial. On the other hand, if A includes a top element then A λP. is a sound analyser which is a maximal element in <sup>A</sup><sup>A</sup>, . Also, V ∼= V means that V = V as total functions, while A ∼= A means that γ ◦ A <sup>=</sup> γ ◦ A . Moreover, the comparison relation is transitive even when considering analysers and verifiers together: if VA and AV then VV , and if AV and VA then AA . Also, the relation shifts soundness from verifiers to analysers, and from analysers to verifiers as follows (due to lack of space the proof is omitted).

**Lemma 6.2.** *Let* V ∈ <sup>V</sup><sup>A</sup> *and* A ∈ <sup>A</sup>A*. If* <sup>V</sup> *is sound and* VA *then* <sup>A</sup> *is sound; if* A *is sound and* AV *then* V *is sound.*

As expected, any sound analyser can be used to refine a given sound verifier (cf. [19,20,24,25]) and this can be formalized and proved in our framework as follows.

**Lemma 6.3.** *Given* A ∈ <sup>A</sup><sup>A</sup> *and* V ∈ <sup>V</sup><sup>A</sup> *which are both sound, let*

$$\tau\_{\mathcal{A}}(\mathcal{V})(P,a) \overset{\Delta}{=} \begin{cases} \mathbf{t} & \text{if } \mathcal{A}(P) \le\_{\gamma} a \\ \mathcal{V}(P,a) & \text{if } \mathcal{A}(P) \not\le\_{\gamma} a \end{cases}$$

*Then,* <sup>τ</sup><sup>A</sup>(V) <sup>∈</sup> <sup>V</sup><sup>A</sup> *is sound,* <sup>τ</sup><sup>A</sup>(V) V *and* <sup>τ</sup><sup>A</sup>(V) = V ⇔ VA*.*

*Proof.* <sup>τ</sup><sup>A</sup>(V) <sup>∈</sup> <sup>V</sup><sup>A</sup> is sound because both <sup>A</sup> and <sup>V</sup> are sound. If <sup>V</sup>(P, a) = **<sup>t</sup>** then <sup>τ</sup><sup>A</sup>(V)(P, a) = **<sup>t</sup>**, i.e., <sup>τ</sup><sup>A</sup>(V) V. Moreover, <sup>τ</sup><sup>A</sup>(V) = <sup>V</sup> iff <sup>A</sup>(P) <sup>≤</sup><sup>γ</sup> <sup>a</sup> <sup>⇒</sup> <sup>V</sup>(P, a) = **<sup>t</sup>** iff VA.

#### **6.1 Optimal and Best Analysers and Verifiers**

It makes sense to define optimality by restricting to sound analysers and verifiers only. Optimality is defined as minimality w.r.t. the precision relation , while being the best analyser/verifier means to be the most precise.

**Definition 6.4 (Optimal and Best Analysers and Verifiers).** A sound analyser A ∈ <sup>A</sup><sup>A</sup> is *optimal* if for any sound <sup>A</sup> <sup>∈</sup> <sup>A</sup>A, <sup>A</sup> A⇒A <sup>∼</sup><sup>=</sup> <sup>A</sup>, while <sup>A</sup> is a *best* analyser if for any sound <sup>A</sup> <sup>∈</sup> <sup>A</sup>A, AA .

A sound verifier V ∈ <sup>V</sup><sup>A</sup> is *optimal* if for any <sup>V</sup> <sup>∈</sup> <sup>V</sup>A, <sup>V</sup> V⇒V <sup>∼</sup><sup>=</sup> <sup>V</sup>, while <sup>V</sup> is the *best* verifier if for any <sup>V</sup> <sup>∈</sup> <sup>V</sup>A, VV .

Let us first observe that if a best analyser or verifier exists then this is unique, while for analysers if <sup>A</sup><sup>1</sup> and <sup>A</sup><sup>2</sup> are two best analysers on <sup>A</sup> then <sup>A</sup><sup>1</sup> <sup>∼</sup><sup>=</sup> <sup>A</sup><sup>2</sup> holds. Of course, the possibility of defining an optimal/best analyser or verifier depends on the abstract domain A. For example, for a variable sign domain such as {Z≤<sup>0</sup>,Z≥<sup>0</sup>,Z} just optimal analysers and verifiers could be defined, because for approximating the set {0} two optimal sound abstract values are available rather than a best sound abstract value. Here, the expected but interesting property to remark is that the notion of precise (i.e., sound and complete) analyser turns out to coincide with the notion of being the best analyser.

# **Lemma 6.5.** *Let* A ∈ <sup>A</sup><sup>A</sup> *be sound. Then,* <sup>A</sup> *is precise iff* <sup>A</sup> *is a best analyser.*

*Proof.* (⇒) Consider any sound <sup>A</sup> <sup>∈</sup> <sup>A</sup>A. Assume, by contradiction, that A A , namely, there exists some P <sup>∈</sup> Prog such that γ(A(P)) ⊆ γ(A (P)). By soundness of A , -P <sup>∈</sup> γ(A (P)), so that, by precision of <sup>A</sup>, γ(A(P)) <sup>⊆</sup> γ(A (P)), which is a contradiction. Thus, AA holds. This means that <sup>A</sup> is a best analyser on A.

(⇐) We have to prove that for any P <sup>∈</sup> Prog and a <sup>∈</sup> A, -P <sup>∈</sup> γ(a) <sup>⇒</sup> γ(A(P)) <sup>⊆</sup> γ(a). Assume, by contradiction, that there exist Q <sup>∈</sup> Prog and b <sup>∈</sup> A such that -Q <sup>∈</sup> γ(b) and γ(A(Q)) ⊆ γ(b). Then, we define <sup>A</sup> : Prog <sup>→</sup> <sup>A</sup> as follows:

$$\mathcal{A}'(P) \stackrel{\Delta}{=} \begin{cases} \mathcal{A}(P) & \text{if } P \not\equiv Q \\ b & \text{if } P \equiv Q \end{cases}$$

It turns out that <sup>A</sup> is a total recursive function because P <sup>≡</sup> Q is decidable. Moreover, <sup>A</sup> is sound: assume that <sup>γ</sup>(A (P)) <sup>⊆</sup> γ(a); if P ≡ Q then <sup>A</sup> (P) = <sup>A</sup>(P) so that γ(A(P)) <sup>⊆</sup> γ(a), and, by soundness of <sup>A</sup>, -P <sup>∈</sup> γ(a); if P <sup>≡</sup> Q then A (Q) = b so that γ(b) = γ(A (Q)) = γ(A (P)) <sup>⊆</sup> γ(a), hence, -Q <sup>∈</sup> γ(b) implies -Q <sup>∈</sup> γ(a). Since <sup>A</sup> is a best analyser on A, we have that AA , so that γ(A(Q)) <sup>⊆</sup> γ(A (Q)) = γ(b), which is a contradiction.

We therefore derive the following consequence of Rice's Theorem 5.3 for static analysis: the best analyser on an extensional abstract domain A exists if and only if A is trivial. This fact formalizes in our model the common intuition that, given any abstract domain, the best static analyser (where best means for any input program) cannot be defined due to Rice's Theorem. An analogous result can be given for verifiers.

**Lemma 6.6.** *Let* V ∈ <sup>V</sup><sup>A</sup> *be sound. Then* <sup>V</sup> *is precise iff* <sup>V</sup> *is the best verifier on* A*.*

*Proof.* Assume that <sup>V</sup> is precise and <sup>V</sup> <sup>∈</sup> <sup>V</sup><sup>A</sup> be sound. If <sup>V</sup> (P, a) = **<sup>t</sup>** then, by soundness of V , -P <sup>∈</sup> γ(a), and in turn, by completeness of <sup>V</sup>, <sup>V</sup>(P, a) = **<sup>t</sup>**, thus proving that VV . On the other hand, assume that V is the best verifier on A. Assume, by contradiction, that <sup>V</sup> is not complete, namely that there exist some Q <sup>∈</sup> Prog and b <sup>∈</sup> A such that -Q <sup>∈</sup> γ(b) and <sup>V</sup>(Q, b) = **?**. We then define <sup>V</sup> : Prog <sup>×</sup>A → {**t**, **?**} as follows:

$$\mathcal{V}'(P,a) \stackrel{\scriptstyle \triangle}{=} \begin{cases} \mathbf{t} & \text{if } P \equiv Q \land a = b \\ \mathcal{V}(P,a) & \text{otherwise} \end{cases}$$

Then, <sup>V</sup> is a total recursive function because P <sup>≡</sup> Q and a <sup>=</sup> b are decidable. Also, V is sound because -Q <sup>∈</sup> γ(b) and <sup>V</sup> is sound. Since <sup>V</sup> is the best verifier, we have that VV , so that V (Q, b) = **<sup>t</sup>** implies <sup>V</sup>(Q, b) = **<sup>t</sup>**, which is a contradiction.

Thus, similarly to static analysis, as a consequence of Rice's Theorem 5.4 for verification, the best nontrivial and monotone verifier on an extensional abstract domain A exists if and only if A is trivial, which is a common belief in program verification. Let us also remark that best abstract program semantics, rather than program analysers, do exist for nontrivial domains (see e.g. [6]). Clearly, this is not in contradiction with Theorem 5.3 since these abstract program semantics are not total recursive functions, i.e., they are not program analysers.

### **7 Reducing Verification to Analysis and Back**

As usual in computability and complexity, our comparison between verification and analysis is made through a many-one reduction, namely by reducing a verification problem into an analysis problem and vice versa. The minimal requirement is that these reduction functions are total recursive. Moreover, we require that the reduction function does not depend upon a fixed abstract domain. This allows us to be problem agnostic and to prove a reduction for all possible verifiers and analysers. Program verification and analysis are therefore equivalent problems whenever we can reduce one to the other. In the following, we prove that while it is always possible to transform a program analyser into an equivalent program verifier, the converse does not hold in general, but it can always be done for finite abstract domains.

#### **7.1 Reducing Verification to Analysis**

**Theorem 7.1.** *Let* A, γ, <sup>≤</sup>γ *be any given abstract domain. There exists a transform* <sup>σ</sup> : <sup>A</sup><sup>A</sup> <sup>→</sup> <sup>V</sup><sup>A</sup> *such that:*


*Proof.* Given A ∈ <sup>A</sup>A, we define σ(A) : Prog <sup>×</sup>A → {**t**, **?**} as follows:

$$\sigma(\mathcal{A})(P,a) \triangleq \begin{cases} \mathbf{t} & \text{if } \mathcal{A}(P) \le\_{\gamma} a \\ \text{?} & \text{if } \mathcal{A}(P) \not\le\_{\gamma} a \end{cases}$$

(1) Since A is a total recursive function and ≤<sup>γ</sup> is decidable, we have that σ(A) is a total recursive function, namely σ(A) <sup>∈</sup> <sup>V</sup>A, and <sup>σ</sup> is a total recursive function as well. Since, by definition, <sup>σ</sup>(A)(P, a) = **<sup>t</sup>** ⇔ A(P) <sup>≤</sup><sup>γ</sup> <sup>a</sup>, we have that <sup>σ</sup>(A) <sup>∼</sup><sup>=</sup> <sup>A</sup>. (2) By Lemma 6.2, if <sup>A</sup> is sound then the equivalent verifier σ(A) is sound as well. (3) It turns out that σ is monotonic: if AA then σ(A )(P, a) = **<sup>t</sup>** ⇔ A (P) <sup>≤</sup><sup>γ</sup> <sup>a</sup> ⇒ A(P) <sup>≤</sup><sup>γ</sup> <sup>A</sup> (P) <sup>≤</sup><sup>γ</sup> <sup>a</sup> <sup>⇔</sup> <sup>σ</sup>(A)(P, a) = **<sup>t</sup>**, so that σ(A) σ(A ) holds. (4) Assume that <sup>σ</sup>(A) <sup>∼</sup><sup>=</sup> <sup>σ</sup>(A ), hence, for any P <sup>∈</sup> Prog, σ(A)(P, <sup>A</sup>(P)) = σ(A )(P, <sup>A</sup>(P)), namely, <sup>A</sup>(P) <sup>≤</sup><sup>γ</sup> <sup>A</sup>(P) ⇔ A (P) <sup>≤</sup><sup>γ</sup> <sup>A</sup>(P), so that <sup>A</sup> (P) <sup>≤</sup><sup>γ</sup> <sup>A</sup>(P) holds. On the other hand, <sup>A</sup>(P) <sup>≤</sup><sup>γ</sup> <sup>A</sup> (P) can be dually obtained, therefore γ(A(P)) = γ(A (P)) holds, namely <sup>A</sup> <sup>∼</sup><sup>=</sup> <sup>A</sup> .

Intuitively, Theorem 7.1 shows that program verification on a given abstract domain A can always and unconditionally be reduced to program analysis on A. This means that a solution to the program analysis problem on A, i.e. the definition of an analyser A, can constructively be transformed into a solution to the program verification problem on the same domain A, i.e. the design of a verifier σ(A) which is equivalent to <sup>A</sup>. The proof of Theorem 7.1 provides this constructive transform σ, which is defined as expected: an analyser <sup>A</sup> on any (possibly infinite) abstract domain A can be used as a verifier for any assertion <sup>a</sup> <sup>∈</sup> <sup>A</sup> simply by checking whether <sup>A</sup>(P) <sup>≤</sup><sup>γ</sup> <sup>a</sup> holds or not.

### **7.2 Reducing Analysis to Verification**

It turns out that the converse of Theorem 7.1 does not hold, namely a program analysis problem in general cannot be reduced to a verification problem. Instead, this reduction can be always done for finite abstract domains. Given a verifier V ∈ <sup>V</sup>A, for any program <sup>P</sup> <sup>∈</sup> Prog, let us define <sup>V</sup>**t**(P) - {a <sup>∈</sup> A | V(P, a) = **<sup>t</sup>**}, namely, <sup>V</sup>**t**(P) is the set of assertions proved by <sup>V</sup> for P. Also, given an assertion a <sup>∈</sup> A, we define <sup>↑</sup>a - {a <sup>∈</sup> <sup>A</sup> <sup>|</sup> <sup>a</sup> <sup>≤</sup><sup>γ</sup> <sup>a</sup> } as the set of assertions weaker than a. The following result provides a useful characterization of the equivalence between verifiers and analysers.

**Lemma 7.2.** *Let* A, γ, <sup>≤</sup>γ *be an abstract domain,* A ∈ <sup>A</sup><sup>A</sup> *and* V ∈ <sup>V</sup>A*. Then,* <sup>A</sup> <sup>∼</sup><sup>=</sup> <sup>V</sup> *if and only if for any* <sup>P</sup> <sup>∈</sup> Prog*,* <sup>V</sup>**t**(P) = ↑A(P)*.*

*Proof.* By Definition 6.1, it turns out that AV iff for any <sup>P</sup>, <sup>V</sup>**t**(P) ⊆ ↑A(P), while we have that VA iff for any <sup>P</sup>, ↑A(P) ⊆ V**t**(P). Thus, <sup>A</sup> <sup>∼</sup><sup>=</sup> <sup>V</sup> if and only if for any P <sup>∈</sup> Prog, <sup>V</sup>**t**(P) = ↑A(P).

A consequence of Lemma 7.2 is that, given V ∈ <sup>V</sup>A, <sup>V</sup> can be transformed into an equivalent analyser <sup>τ</sup> (V) <sup>∈</sup> <sup>A</sup><sup>A</sup> if and only if for any program <sup>P</sup>, an assertion <sup>a</sup><sup>P</sup> <sup>∈</sup> <sup>A</sup> exists such that <sup>V</sup>**t**(P) = <sup>↑</sup>a<sup>P</sup> . In this case, one can then define τ (V)(P) <sup>a</sup><sup>P</sup> .

**Lemma 7.3.** *Let* A, γ, <sup>≤</sup>γ *be an abstract domain and* V ∈ <sup>V</sup>A*. If* A ∈ <sup>A</sup><sup>A</sup> *is such that* <sup>A</sup> <sup>∼</sup><sup>=</sup> <sup>V</sup> *then:* (1) <sup>A</sup> <sup>=</sup> <sup>∅</sup>*;* (2) <sup>V</sup> *is not trivial;* (3) <sup>V</sup> *is monotone.*

*Proof.* (1) We observed just after Definition 4.1 that no analyser can be defined on the empty abstract domain. (2) If V is trivial then there exists a program <sup>Q</sup> <sup>∈</sup> Prog such that for any <sup>a</sup> <sup>∈</sup> <sup>A</sup>, <sup>V</sup>(Q, a) = **?**, so that if <sup>V</sup> <sup>∼</sup><sup>=</sup> <sup>A</sup> for some A ∈ <sup>A</sup><sup>A</sup> then, from VA we would derive <sup>V</sup>(Q, <sup>A</sup>(Q)) = **<sup>t</sup>**, which is a contradiction. (3) Assume that <sup>V</sup> is not monotone. Then, there exist Q <sup>∈</sup> Prog and a, a <sup>∈</sup> <sup>A</sup> such that <sup>a</sup> ∈ V**t**(Q), <sup>a</sup> <sup>≤</sup><sup>γ</sup> <sup>a</sup> but <sup>a</sup> ∈ V**t**(Q). If <sup>V</sup> <sup>∼</sup><sup>=</sup> <sup>A</sup>, for some A ∈ <sup>A</sup>A, then, by Lemma 7.2, <sup>V</sup>**t**(Q) = ↑A(Q), so that we would have that a ∈ ↑A(Q) but a ∈ ↑A(Q), which is a contradiction.

We also observe that even for a nontrivial and monotone verifier V ∈ <sup>V</sup><sup>A</sup> on a finite abstract domain A, it is not guaranteed that an equivalent analyser exists. In fact, if an equivalent analyser A exists then, by Lemma 7.2, for any program P, <sup>V</sup>**t**(P) must contain the least element, namely for any program <sup>P</sup> it must be the case that there exists a strongest assertion proved by <sup>V</sup> for P.

**Example 7.4.** Consider a sign domain such as S - {Z≤0,Z≥0,Z} where <sup>Z</sup>≤<sup>0</sup> <sup>≤</sup><sup>γ</sup> <sup>Z</sup> and <sup>Z</sup>≥<sup>0</sup> <sup>≤</sup><sup>γ</sup> <sup>Z</sup>. For a program such as <sup>Q</sup> <sup>≡</sup> <sup>x</sup> := 0, a sound verifier V ∈ <sup>V</sup><sup>S</sup> could be able to prove all the assertions in <sup>S</sup>, namely <sup>V</sup>**t**(Q) = <sup>S</sup>. However, there exists no assertion <sup>a</sup><sup>Q</sup> <sup>∈</sup> <sup>S</sup> such that <sup>V</sup>**t**(Q) = <sup>↑</sup>a<sup>Q</sup>. Hence, by Lemma 7.2, there exists no analyser in <sup>A</sup><sup>S</sup> which is equivalent to <sup>V</sup>. Also, if S - {Z=0,Z≤0,Z≥0,Z}, so that S is a meet-semilattice, and <sup>V</sup> <sup>∈</sup> <sup>V</sup><sup>S</sup> is a sound verifier such that V **<sup>t</sup>**(Q) = S - {Z=0}, still, by Lemma 7.2, there exists no analyser in A<sup>S</sup> which is equivalent to V .

**Definition 7.5.** A verifier V ∈ <sup>V</sup><sup>A</sup> is *finitely meet-closed* when for any <sup>P</sup> <sup>∈</sup> Prog and a, a1, a<sup>2</sup> <sup>∈</sup> <sup>A</sup>, if <sup>V</sup>(P, a<sup>1</sup>) = **<sup>t</sup>** <sup>=</sup> <sup>V</sup>(P, a<sup>2</sup>) and <sup>γ</sup>(a) = <sup>γ</sup>(a<sup>1</sup>) <sup>∩</sup> <sup>γ</sup>(a<sup>2</sup>) then <sup>V</sup>(P, a) = **<sup>t</sup>**. The following notation will be used: for any domain A,

V<sup>+</sup> <sup>A</sup> -{V ∈ <sup>V</sup><sup>A</sup> | V is nontrivial, monotone and finitely meet-closed}.

Thus, finitely meet-closed verifiers can prove logical conjunctions of provable assertions.

**Theorem 7.6 (Reduction for Finite Domains).** *Let* A, γ, <sup>≤</sup>γ *be a nonempty finite abstract domain. There exists a transform* τ : <sup>V</sup><sup>+</sup> <sup>A</sup> <sup>→</sup> <sup>A</sup><sup>A</sup> *such that:*


*.*


*Proof.* (1) Let <sup>A</sup> <sup>=</sup> {a<sup>1</sup>, ..., a<sup>n</sup>} be any enumeration of A, with n <sup>≥</sup> 1. Given V ∈ <sup>V</sup><sup>+</sup> <sup>A</sup> , we define <sup>τ</sup> (V) : Prog <sup>→</sup> <sup>A</sup> as follows:

$$\tau(\mathcal{V})(P) \triangleq \begin{cases} r := \text{undef}; \\ \textbf{for all } i \in 1..n \text{ do} \\ \textbf{if } \left( a\_i \in \mathcal{V}\_\mathbf{t}(P) \land \left( r = \text{undef} \lor \ a\_i \leq\_\gamma r \right) \right) \textbf{then} \ r := a\_i; \\ \textbf{output } r \end{cases}$$

Then, it turns out that τ is a total recursive function. Since <sup>V</sup> is a total recursive function, <sup>A</sup> is finite and <sup>≤</sup><sup>γ</sup> is decidable, we have that <sup>τ</sup> (V) is a total recursive function, so that τ (V) <sup>∈</sup> <sup>A</sup>A. Since <sup>V</sup> is not trivial, for any <sup>P</sup> <sup>∈</sup> Prog, <sup>V</sup>**t**(P) <sup>=</sup> <sup>∅</sup>. Also, since <sup>A</sup> is finite and <sup>V</sup> is finitely meet-closed there exists some <sup>a</sup><sup>k</sup> ∈ V**t**(P) such that <sup>V</sup>**t**(P) ⊆ ↑ak, so that τ (V)(P) outputs some value in A. Moreover, since <sup>V</sup> is monotone, <sup>↑</sup>a<sup>k</sup> ⊆ V**t**(P), so that <sup>↑</sup>a<sup>k</sup> <sup>=</sup> <sup>V</sup>**t**(P). Thus, the above procedure defining τ (V)(P) finds and outputs ak. Hence, for any <sup>P</sup> <sup>∈</sup> Prog and <sup>a</sup> <sup>∈</sup> <sup>A</sup>, <sup>V</sup>(P, a) = **<sup>t</sup>** <sup>⇔</sup> <sup>a</sup> ∈ V**t**(P) <sup>⇔</sup> <sup>a</sup> ∈ ↑a<sup>k</sup> <sup>⇔</sup> <sup>a</sup><sup>k</sup> <sup>≤</sup><sup>γ</sup> <sup>a</sup> <sup>⇔</sup> <sup>τ</sup> (V)(P) <sup>≤</sup><sup>γ</sup> <sup>a</sup>, that is, <sup>τ</sup> (V) <sup>∼</sup><sup>=</sup> <sup>V</sup> holds.

(2) By Lemma 6.2, if <sup>V</sup> is sound then the equivalent analyser τ (V) is sound as well.

(3) It turns out that τ is monotonic: if VV then, by definition, <sup>V</sup> **<sup>t</sup>**(P) <sup>⊆</sup> <sup>V</sup>**t**(P), so that, since <sup>V</sup>**t**(P) = <sup>↑</sup> τ (V)(P) and <sup>V</sup> **<sup>t</sup>**(P) = <sup>↑</sup> τ (V )(P), we obtain <sup>τ</sup> (V)(P) <sup>≤</sup><sup>γ</sup> <sup>τ</sup> (V )(P), namely τ (V) τ (V ) holds.

(4) Assume that <sup>τ</sup> (V) <sup>∼</sup><sup>=</sup> <sup>τ</sup> (V ). Hence, for any P <sup>∈</sup> Prog, γ(τ (V)(P)) = γ(τ (V )(P)), so that, since <sup>V</sup>**t**(P) = <sup>↑</sup> τ (V)(P) and <sup>V</sup> **<sup>t</sup>**(P) = <sup>↑</sup> τ (V )(P), we obtain <sup>V</sup>**t**(P) = <sup>V</sup> **<sup>t</sup>**(P), namely <sup>V</sup> <sup>=</sup> <sup>V</sup> .

An example of this reduction of verification to static analysis for finite domains is dataflow analysis as model checking shown in [31] (excluding Kildall's constant propagation domain [16]). Let us now focus on infinite domains of assertions.

**Lemma 7.7.** *There exists a denumerable infinite abstract domain* A, γ, <sup>≤</sup>γ *and a verifier* V ∈ <sup>V</sup><sup>+</sup> <sup>A</sup> *such that for any analyser* A ∈ <sup>A</sup>A*,* <sup>A</sup> ∼= V*.*

*Proof.* Let us consider the infinite domain T - <sup>N</sup> ∪ {} together with the following concretization function: γ() -Prog and, for any n <sup>∈</sup> <sup>N</sup>,

γ(n) -{P <sup>∈</sup> Prog <sup>|</sup> P on input 0 converges in n or fewer steps}

where the number of steps is determined by a small-step operational semantics <sup>⇒</sup>, as recalled in Sect. 2. Thus, we have that if n, m <sup>∈</sup> <sup>N</sup> then <sup>n</sup> <sup>≤</sup><sup>γ</sup> <sup>m</sup> iff <sup>n</sup> <sup>≤</sup><sup>N</sup> <sup>m</sup>, while <sup>n</sup> <sup>≤</sup><sup>γ</sup> . We define a function <sup>V</sup> : Prog <sup>×</sup> <sup>T</sup> → {**t**, **?**} as follows:

<sup>V</sup>(P, a) - ⎧ ⎪⎨ ⎪⎩ **<sup>t</sup>** if a <sup>=</sup> **<sup>t</sup>** if a <sup>=</sup> n and P on input 0 converges in n or fewer steps **?** if a <sup>=</sup> n and P on input 0 does not converge in n or fewer steps

Clearly, for any number n <sup>∈</sup> <sup>N</sup>, the predicate "P on input 0 converges in n or fewer steps" is decidable, where the input 0 could be replaced by any other (finite set of) input value(s). Hence, V turns out to be a total recursive function, that is, a verifier on the abstract domain T. In particular, let us remark that V is a sound verifier. Moreover, <sup>V</sup> is nontrivial, since, for any P <sup>∈</sup> Prog, <sup>V</sup>(P, ) = **<sup>t</sup>**, and monotone because if <sup>V</sup>(P, n) = **<sup>t</sup>** and <sup>n</sup> <sup>≤</sup><sup>γ</sup> <sup>a</sup> then either <sup>a</sup> <sup>=</sup> and <sup>V</sup>(P, ) = **<sup>t</sup>** or <sup>a</sup> <sup>=</sup> <sup>m</sup>, so that <sup>n</sup> <sup>≤</sup><sup>N</sup> <sup>m</sup> and therefore <sup>V</sup>(P,m) = **<sup>t</sup>**. Clearly, <sup>V</sup> is also finitely meet-closed, because if <sup>V</sup>(P, a<sup>1</sup>) = **<sup>t</sup>** <sup>=</sup> <sup>V</sup>(P, a<sup>2</sup>) and <sup>γ</sup>(a) = <sup>γ</sup>(a<sup>1</sup>) <sup>∩</sup> <sup>γ</sup>(a<sup>2</sup>) then either <sup>a</sup> <sup>=</sup> <sup>a</sup><sup>1</sup> or <sup>a</sup> <sup>=</sup> <sup>a</sup><sup>2</sup>, so that <sup>V</sup>(P, a) = **<sup>t</sup>**. Summing up, it turns out that V ∈ <sup>V</sup><sup>+</sup> <sup>T</sup>. Assume now, by contradiction, that there exists an analyser A ∈ <sup>A</sup><sup>T</sup> such that <sup>A</sup> <sup>∼</sup><sup>=</sup> <sup>V</sup>. By Lemma 7.2, for any <sup>P</sup> <sup>∈</sup> Prog, we have that <sup>V</sup>**t**(P) = ↑A(P). Hence, if P on input 0 diverges then <sup>V</sup>**t**(P) = {} so that <sup>A</sup>(P) = , while if <sup>P</sup> on input 0 converges in exactly n steps then <sup>V</sup>**t**(P) = {m <sup>∈</sup> <sup>N</sup> <sup>|</sup> m <sup>≥</sup> n} ∪ {}, so <sup>A</sup>(P) = n, namely <sup>A</sup> goes as follows:

$$\mathcal{A}(P) = \begin{cases} \top & \text{if } P \text{ on input 0 diverges} \\ n & \text{if } P \text{ on input 0 converges in exactly } n \text{ steps} \end{cases}$$

Since A is a total recursive function, we would have defined an algorithm A for deciding if a program P <sup>∈</sup> Prog on input 0 terminates or not. Since Prog is assumed to be Turing complete with respect to the operational semantics ⇒, this leads to a contradiction.

As a straight consequence of Lemma 7.7, the following theorem proves that for any infinite abstract domain A, no reduction from verifiers in <sup>V</sup><sup>+</sup> <sup>A</sup> to equivalent analysers in A<sup>A</sup> is possible.

**Theorem 7.8 (Impossibility of the Reduction for Infinite Domains).** *For any denumerable infinite abstract domain* A, γ, <sup>≤</sup>γ*, there exists no function* τ : <sup>V</sup><sup>+</sup> <sup>A</sup> <sup>→</sup> <sup>A</sup><sup>A</sup> *such that* <sup>τ</sup> *is a total recursive function and for all* V ∈ <sup>V</sup><sup>+</sup> A*,* <sup>τ</sup> (V) <sup>∼</sup><sup>=</sup> <sup>V</sup>*.*

*Proof.* Assume, by contradiction, that τ : <sup>V</sup><sup>+</sup> <sup>A</sup> <sup>→</sup> <sup>A</sup><sup>A</sup> is a total recursive function such that for all V ∈ <sup>V</sup><sup>+</sup> <sup>A</sup>, <sup>τ</sup> (V) <sup>∈</sup> <sup>A</sup><sup>A</sup> and <sup>τ</sup> (V) <sup>∼</sup><sup>=</sup> <sup>V</sup>. Then, for the infinite domain A and verifier V ∈ <sup>V</sup><sup>+</sup> <sup>A</sup> provided by Lemma 7.7, we would be able to construct an analyser <sup>τ</sup> (V) <sup>∈</sup> <sup>A</sup><sup>A</sup> such that <sup>τ</sup> (V) <sup>∼</sup><sup>=</sup> <sup>V</sup>, which would be in contradiction with Lemma 7.7.

Intuitively, this result states that given any infinite abstract domain A, no general algorithm exists for constructively designing out of a reasonable (i.e., nontrivial, monotone and finitely meet-closed) verifier <sup>V</sup> on A an equivalent analyser on the same domain A. This can be read as a precise statement proving the folklore belief that "program analysis is harder than verification", at least for infinite domains of program assertions. It is important to remark that the verifier V ∈ <sup>V</sup><sup>+</sup> <sup>A</sup> on the infinite domain <sup>A</sup> defined by the proof of Lemma 7.7 is sound. Thus, even if we restrict the reduction transform τ : <sup>V</sup><sup>+</sup>,sound <sup>A</sup> <sup>→</sup> <sup>A</sup>sound A of Theorem 7.8 to be applied to sound verifiers—so that by Lemma 6.2 the range would be the sound analysers in AA—the same proof of Lemma 7.7 could still be used for proving that such transform τ cannot exist.

A further consequence of Theorem 7.8 is the fact proved in [10] that abstract interpretation-based program analysis with infinite domains and widening/narrowing operators is strictly more powerful than with finite domains.

# **8 Conclusion and Future Work**

We put forward a general model for studying static program analysers and verifiers from a computability perspective. This allowed us to state and prove, with simple arguments borrowed from standard computability theory, that for infinite abstract domains of program assertions, program analysis is a harder problem than program verification. This is, to the best of our knowledge, the first formalization and proof of this popular belief, which also includes the relationship between type inference and type checking. We think that this foundational model can be extended to study further properties of program analysers and verifiers. In particular, this opens interesting perspectives in reasoning about program analysis and verification in a more abstract way towards a theory of computation that may include approximate methods, such as program analysers and verifiers, as objects of investigation, as suggested in [5,14]. For instance, the precision of program analysis and program verification, as well as their computational complexity, are intensional program properties. Intensionally different but extensionally equivalent programs may exhibit completely different behaviours when analysed or verified. In this perspective, new intensional versions of Rice's Theorem can be stated for program analysis, similarly to what is known for Blum's complexity in [2]. Also, new models for reasoning about the space and time complexities of program analysis and verification algorithms can be studied, especially for defining a notion of complexity class of program analysers and verifiers.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Theory and Security

# **Automata vs Linear-Programming Discounted-Sum Inclusion**

Suguman Bansal(B), Swarat Chaudhuri(B) , and Moshe Y. Vardi(B)

> Rice University, Houston, TX 77005, USA suguman@rice.edu

**Abstract.** The problem of *quantitative inclusion* formalizes the goal of comparing quantitative dimensions between systems such as worst-case execution time, resource consumption, and the like. Such systems are typically represented by formalisms such as weighted logics or weighted automata. Despite its significance in analyzing the quality of computing systems, the study of quantitative inclusion has mostly been conducted from a theoretical standpoint. In this work, we conduct the first empirical study of quantitative inclusion for discounted-sum weighted automata (DS-inclusion, in short).

Currently, two contrasting approaches for DS-inclusion exist: the linear-programming based DetLP and the purely automata-theoretic BCV. Theoretical complexity of DetLP is exponential in time and space while of BCV is PSPACE-complete. All practical implementations of BCV, however, are also exponential in time and space. Hence, it is not clear which of the two algorithms renders a superior implementation.

In this work we present the first implementations of these algorithms, and perform extensive experimentation to compare between the two approaches. Our empirical analysis shows how the two approaches complement each other. This is a nuanced picture that is much richer than the one obtained from the theoretical study alone.

# **1 Introduction**

The analysis of quantitative dimensions of systems, such as worst-case execution time, energy consumption, and the like, has been studied thoroughly in recent times. By and large, these investigations have tended to be purely theoretical. While some efforts in this space [12,13] do deliver prototype tools, the area lacks a thorough empirical understanding of the relative performance of different but related algorithmic solutions. In this paper, we further such an empirical understanding for *quantitative inclusion* for *discounted-sum weighted automata*.

Weighted automata [17] are a popular choice for system models in quantitative analysis. The problem of quantitative language inclusion [15] formalizes the goal of determining which of any two given systems is more efficient under such a system model. In a discounted-sum weighted automata the value of quantitative dimensions are computed by *aggregating* the costs incurred during each step of a system execution with discounted-sum aggregation. The discountedsum (DS) function relies on the intuition that costs incurred in the near future are more "expensive" than costs incurred later on. Naturally, it is the choice for aggregation for applications in economics and game-theory [20], Markov Decision Processes with discounted rewards [16], quantitative safety [13], and more.

The hardness of quantitative inclusion for nondeterministic DS-automata, or DS-inclusion, is evident from PSPACE-hardness of language-inclusion (LI) problem for nondeterministic B¨uchi automata [23]. Decision procedures for DS-inclusion were first investigated in [15], and subsequently through target discounted-sum [11], DS-determinization [10]. A comparator-based argument [9] finally established its PSPACE-completeness. However, these theoretical advances in DS-inclusion have not been accompanied with the development of efficient and scalable tools and algorithms. This is the focus of this paper; our goal is to develop practical algorithms and tools for DS-inclusion.

Theoretical advances have lead to two algorithmic approaches for DSinclusion. The first approach, referred to as DetLP, combines automata-theoretic reasoning with linear-programming (LP). This method first determinizes the DS-automata [10], and reduces the problem of DS-inclusion for deterministic DS-automata to LP [7,8]. Since determinization of DS-automata causes an exponential blow-up, DetLP yields an exponential time algorithm. An essential feature of this approach is the separation of automata-theoretic reasoning– determinization–and numerical reasoning, performed by an LP-solver. Because of this separation, it does not seem easy to apply on-the-fly techniques to this approach and perform it using polynomial space, so this approach uses exponential time and space.

In contrast, the second algorithm for DS-inclusion, referred to as BCV (after name of authors) is purely automata-theoretic [9]. The component of numerical reasoning between costs of executions is handled by a special B¨uchi automaton, called the *comparator*, that enables an on-line comparison of the discountedsum of a pair of weight-sequences. Aided by the comparator, BCV reduces DS-inclusion to language-equivalence between B¨uchi automata. Since languageequivalence is in PSPACE, BCV is a polynomial-space algorithm.

While the complexity-theoretic argument may seem to suggest a clear advantage for the pure automata-theoretic approach of BCV, the perspective from an implementation point of view is more nuanced. BCV relies on LI-solvers as its key algorithmic component. The polynomial-space approach for LI relies on Savitch's Theorem, which proves the equivalence between deterministic and nondeterministic space complexity [21]. This theorem, however, does not yield a practical algorithm. Existing efficient LI-solvers [3,4] are based on Ramsey-based inclusion testing [6] or rank-based approaches [18]. These tools actually use exponential time and space. In fact, the exponential blow-up of Ramsey-based approach seems to be worse than that of DS-determinization. Thus, the theoretical advantage BCV seems to evaporate upon close examination. Thus, it is far from clear which algorithmic approach is superior. To resolve this issue, we provide in this paper the first implementations for both algorithms and perform exhaustive empirical analysis to compare their performance.

Our first tool, also called DetLP, implements its namesake algorithm as it is. We rely on existing LP-solver GLPSOL to perform numerical reasoning. Our second tool, called QuIP, starts from BCV, but improves on it. The key improvement arises from the construction of an improved comparator with fewer states. We revisit the reduction to language inclusion in [9] accordingly. The new reduction reduces the transition-density of the inputs to the LI-solver (Transition density is the ratio of transitions to states), improving the overall performance of QuIP since LI-solvers are known to scale better at lower transition-density inputs [19].

Our empirical analysis reveals that theoretical complexity does not provide a full picture. Despite its poorer complexity, QuIP scales significantly better than DetLP, although DetLP solves more benchmarks. Based on these observations, we propose a method for DS-inclusion that leverages the complementary strengths of these tools to offer a scalable tool for DS-inclusion. Our evaluation also highlights the limitations of both approaches, and opens directions for further research in improving tools for DS-inclusion.

# **2 Preliminaries**

**B¨uchi Automata.** <sup>A</sup> *B¨uchi automaton* [23] is a tuple <sup>A</sup> = (*S*, Σ, δ,*Init*, <sup>F</sup>), where *<sup>S</sup>* is a finite set of *states*, <sup>Σ</sup> is a finite *input alphabet*, <sup>δ</sup> <sup>⊆</sup> (*<sup>S</sup>* <sup>×</sup> <sup>Σ</sup> <sup>×</sup> *<sup>S</sup>*) is the *transition relation*, *Init* ⊆ *S* is the set of *initial states*, and F ⊆ *S* is the set of *accepting states*. A B¨uchi automaton is *deterministic* if for all states s and inputs <sup>a</sup>, |{s <sup>|</sup>(s, a, s ) <sup>∈</sup> <sup>δ</sup>}| ≤ 1. Otherwise, it is *nondeterministic*. For a word <sup>w</sup> <sup>=</sup> <sup>w</sup>0w<sup>1</sup> ... <sup>∈</sup> <sup>Σ</sup><sup>ω</sup>, a *run* <sup>ρ</sup> of <sup>w</sup> is a sequence of states <sup>s</sup>0s<sup>1</sup> ... satisfying: (1) <sup>s</sup><sup>0</sup> <sup>∈</sup> *Init*, and (2) <sup>τ</sup><sup>i</sup> = (si, wi, s<sup>i</sup>+1) <sup>∈</sup> <sup>δ</sup> for all <sup>i</sup>. Let *inf* (ρ) denote the set of states that occur infinitely often in run ρ. A run ρ is an *accepting run* if *inf* (ρ) ∩ F <sup>=</sup> <sup>∅</sup>. A word <sup>w</sup> is an accepting word if it has an accepting run.

The language L(A) of B¨uchi automaton A is the set of all words accepted by it. B¨uchi automata are known to be closed under set-theoretic union, intersection, and complementation. For B¨uchi automata A and B, the *language-equivalence* and *language-inclusion* are whether <sup>L</sup>(A) ≡ L(B) and <sup>L</sup>(A) ⊆ L(B), resp.

Let A = A[0], A[1],... be a natural-number sequence, d > 1 be a rational number. The *discounted-sum* of A with discount-factor d is *DS*(A, d) = Σ<sup>∞</sup> i=0 A[i] <sup>d</sup>*<sup>i</sup>* . For number sequences <sup>A</sup> and <sup>B</sup>, (A, B) and (A−B) denote the sequences where the <sup>i</sup>-th element is (A[i], B[i]) and <sup>A</sup>[i] <sup>−</sup> <sup>B</sup>[i], respectively.

**Discounted-Sum Automata.** A *discounted-sum automaton* with discountfactor d > 1, *DS-automaton* in short, is a tuple <sup>A</sup> = (M, γ), where <sup>M</sup> <sup>=</sup> (*S*, Σ, δ,*Init*, *<sup>S</sup>*) is a B¨uchi automaton, and <sup>γ</sup> : <sup>δ</sup> <sup>→</sup> <sup>N</sup> is the *weight function* that assigns a weight to each transition of automaton M. *Words* and *runs* in weighted ω-automata are defined as they are in B¨uchi automata. Note that all states are accepting states in this definition. The *weight sequence* of run ρ = s0s<sup>1</sup> ... of word w = w0w<sup>1</sup> ... is given by wt<sup>ρ</sup> = n0n1n<sup>2</sup> ... where n<sup>i</sup> = γ(si, wi, s<sup>i</sup>+1) for all i. The *weight of a run* ρ is given by *DS*(wtρ, d). For simplicity, we denote this by *DS*(ρ, d). The *weight of a word* in DS-automata is defined as wtA(w) = sup{*DS*(ρ, d)|<sup>ρ</sup> is a run of <sup>w</sup> in A}. By convention, if a word <sup>w</sup> ∈ L(A), then

**Fig. 1.** System <sup>S</sup>

**Fig. 2.** Specification <sup>P</sup>

wtA(w)=0[15]. A DS-automata is said to be *complete* if from every state there is at least one transition on every alphabet. Formally, for all <sup>p</sup> <sup>∈</sup> *<sup>S</sup>* and for all <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>, there exists <sup>q</sup> <sup>∈</sup> *<sup>S</sup>* s.t (p, a, q) <sup>∈</sup> <sup>δ</sup>. A run <sup>ρ</sup> <sup>∈</sup> <sup>P</sup> of word <sup>w</sup> ∈ L(P) is a *diminished run* if there exists a run <sup>σ</sup> <sup>∈</sup> <sup>Q</sup> over the same word <sup>w</sup> s.t. *DS*(ρ, d) <sup>&</sup>lt; *DS*(σ, d). We abuse notation, and use <sup>w</sup> ∈ A to mean <sup>w</sup> ∈ L(A) for B¨uchi automaton or DS-automaton A. We limit ourselves to integer discount-factors only. Given DS-automata P and Q and discount-factor d > 1, the *discountedsum inclusion problem*, denoted by <sup>P</sup> <sup>⊆</sup><sup>d</sup> <sup>Q</sup>, determines whether for all words <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup>, wt<sup>P</sup> (w) <sup>≤</sup> wtQ(w).

**Comparator Automata.** For natural number μ, integer discount-factor d > 1 and inequality relation <sup>≤</sup>, the *discounted-sum comparator* <sup>A</sup>μ,d <sup>≤</sup> , *comparator*, in short, is a B¨uchi automaton that accepts (infinite) words over the alphabet {0, <sup>1</sup> ...,μ <sup>−</sup> <sup>1</sup>}×{0, <sup>1</sup> ...,μ <sup>−</sup> <sup>1</sup>} such that a pair (A, B) of sequences is in L(A<sup>μ</sup> <sup>d</sup> ) iff *DS*(A, d) <sup>≤</sup> *DS*(B, d). Closure properties of B¨uchi automata ensure that comparator exists for all inequality relations [9].

**Motivating Example.** As an example of such a problem formulation, consider the system and specification in Figs. 1 and 2, respectively [15]. Here, the specification P depicts the worst-case energy-consumption model for a motor, and the system S is a candidate implementation of the motor. Transitions in S and P are labeled by transition-action and transition-cost. The cost of an execution (a sequence of actions) is given by an *aggregate* of the costs of transitions along its run (a sequence of automaton states). In non-deterministic automata, where each execution may have multiple runs, cost of the execution is the cost of the run with maximum cost. A critical question here is to check whether implementation S is more energy-efficient than specification P. This problem can be framed as a problem of quantitative inclusion between S and P.

# **3 Prior Work**

We discuss existing algorithms for DS-inclusion i.e. DetLP and BCV in detail.

### **3.1 DetLP: DS-determinization and LP-based**

B¨oker and Henzinger studied complexity and decision-procedures for determinization of DS-automata in detail [10]. They proved that a DS-automata can be determinized if it is complete, all its states are accepting states and the discount-factor is an integer. Under all other circumstances, DS-determinization may not be guaranteed. DS-determinization extends subset-construction for automata over finite words. Every state of the determinized DS-automata is represented by an <sup>|</sup>S|-tuple of numbers, where <sup>S</sup> <sup>=</sup> {q1,...q|S|} denotes the set of states of the original DS-automaton. The value stored in the <sup>i</sup>-th place in the <sup>|</sup>S|-tuple represents the "gap" or extra-cost of reaching state <sup>q</sup><sup>i</sup> over a finite-word w compared to its best value so far. The crux of the argument lies in proving that when the DS-automata is complete and the discount-factor is an integer, the "gap" can take only finitely-many values, yielding finiteness of the determinized DS-automata, albeit exponentially larger than the original.

**Theorem 1** [10] [DS-determinization analysis]*. Let* <sup>A</sup> *be a complete DSautomata with maximum weight* μ *over transitions and* s *number of states. DSdeterminization of* A *generates a DS-automaton with at most* μ<sup>s</sup> *states.*

Chatterjee et al. reduced <sup>P</sup> <sup>⊆</sup><sup>d</sup> <sup>Q</sup> between non-deterministic DS-automata <sup>P</sup> and deterministic DS-automata <sup>Q</sup> to linear-programming [7,8,15]. First, the product DS-automata <sup>P</sup> <sup>×</sup> <sup>Q</sup> is constructed so that (s<sup>P</sup> , sQ) <sup>a</sup> −→ (t<sup>P</sup> , tQ) is a transition with weight <sup>w</sup><sup>P</sup> <sup>−</sup>w<sup>Q</sup> if transition <sup>s</sup><sup>M</sup> a −→ <sup>t</sup><sup>M</sup> with weight <sup>w</sup><sup>M</sup> is present in <sup>M</sup>, for <sup>M</sup> ∈ {P, Q}. <sup>P</sup> <sup>⊆</sup><sup>q</sup> <sup>Q</sup> is False iff the weight of any word in <sup>P</sup> <sup>×</sup> <sup>Q</sup> is greater than 0. Since Q is deterministic, it is sufficient to check if the maximum weight of all infinite paths from the initial state in <sup>P</sup> <sup>×</sup> <sup>Q</sup> is greater than 0. For discountedsum, the maximum weight of paths from a given state can be determined by a linear-program: Each variable (one for each state) corresponds to the weight of paths originating in this state, and transitions decide the constraints which relate the values of variables (or states) on them. The objective is to maximize weight of variable corresponding to the initial state.

Therefore, the DetLP method for <sup>P</sup> <sup>⊆</sup><sup>d</sup> <sup>Q</sup> is as follows: Determinize <sup>Q</sup> to <sup>Q</sup><sup>D</sup> via DS-determinization method from [10], and reduce <sup>P</sup> <sup>⊆</sup><sup>d</sup> <sup>Q</sup><sup>D</sup> to linear programming following [15]. Note that since determinization is possible only if the DS-automaton is complete, DetLP can be applied only if <sup>Q</sup> is complete.

**Lemma 1.** *Let* P *and* Q *be non-deterministic DS-automata with* s<sup>P</sup> *and* s<sup>Q</sup> *number of states respectively,* τ<sup>P</sup> *states in* P*. Let the alphabet be* Σ *and maximum weight on transitions be* <sup>μ</sup>*. Then* <sup>P</sup> <sup>⊆</sup><sup>d</sup> <sup>Q</sup> *is reduced to linear programming with* <sup>O</sup>(s<sup>P</sup> · <sup>μ</sup><sup>s</sup>*<sup>Q</sup>* ) *variables and* <sup>O</sup>(τ<sup>P</sup> · <sup>μ</sup><sup>s</sup>*<sup>Q</sup>* · |Σ|) *constraints.*

Anderson and Conitzer [7] proved that this system of linear equations can be solved in <sup>O</sup>(<sup>m</sup> · <sup>n</sup><sup>2</sup>) for <sup>m</sup> constraints and <sup>n</sup> variables. Therefore,

**Theorem 2** [7,15] [Complexity of DetLP]*. Let* <sup>P</sup> *and* <sup>Q</sup> *be DS-automata with* <sup>s</sup><sup>P</sup> *and* s<sup>Q</sup> *number of states respectively,* τ<sup>P</sup> *states in* P*. Let the alphabet be* Σ *and maximum weight on transitions be* <sup>μ</sup>*. Complexity of DetLP is* <sup>O</sup>(s<sup>2</sup> <sup>P</sup> ·τ<sup>P</sup> ·μ<sup>s</sup>*<sup>Q</sup>* ·|Σ|)*.*


```
10: return Pˆ ≡ Dim
```

```
Algorithm 1. BCV(P, Q, d), Is P ⊆d Q?
```
### **3.2 BCV: Comparator-based approach**

The key idea behind BCV is that <sup>P</sup> <sup>⊆</sup><sup>d</sup> <sup>Q</sup> holds iff every run of <sup>P</sup> is a diminished run. As a result, BCV constructs an intermediate B¨uchi automaton *Dim* that consists of all diminished runs of P. It then checks whether *Dim* consists of all runs of P, by determining language-equivalence between *Dim* and an automaton <sup>P</sup><sup>ˆ</sup> that consists of all runs of <sup>P</sup>. The comparator <sup>A</sup>μ,d <sup>≤</sup> is utilized in the construction of *Dim* to compare weight of runs in P and Q.

Strictly speaking, BCV as presented in [9], is a generic algorithm for inclusion under a general class of aggregate functions, called ω-regular aggregate functions. Here, BCV (Algorithm 1) refers to its adaptation to DS. Procedure AugmentWtAndLabel separates between runs of the same word in DS-automata by assigning a unique transition-identity to each transition. It also appends the transition weight, to enable weight comparison afterwards. Specifically, it transforms DS-automaton <sup>A</sup> into B¨uchi automaton <sup>A</sup>ˆ, with all states as accepting, by converting transition <sup>τ</sup> <sup>=</sup> <sup>s</sup> <sup>a</sup> −→ <sup>t</sup> with weight wt and unique transition-identity l to transition ˆτ = s (a,wt,l) −−−−−→ <sup>t</sup> in <sup>A</sup>ˆ. Procedure MakeProductSameAlpha(P , <sup>ˆ</sup> <sup>Q</sup>ˆ) takes the product of <sup>P</sup><sup>ˆ</sup> and <sup>Q</sup><sup>ˆ</sup> over the same word i.e., transitions <sup>s</sup><sup>A</sup> (a,n*A*,l*A*) −−−−−−→ <sup>t</sup><sup>A</sup> in <sup>A</sup>, for A∈{P , <sup>ˆ</sup> <sup>Q</sup>ˆ}, generates transition (s<sup>P</sup> , sQ) (a,n*<sup>P</sup>* ,l*<sup>P</sup>* ,n*Q*,l*Q*) −−−−−−−−−−→ (t<sup>P</sup> , tQ) in <sup>P</sup><sup>ˆ</sup> <sup>×</sup> <sup>Q</sup>ˆ. The comparator <sup>A</sup>μ,d <sup>≤</sup> is constructed with upper-bound <sup>μ</sup> that equals the maximum weight of transitions in P and Q, and discount-factor d. Intersect matches the alphabet of <sup>P</sup><sup>ˆ</sup> <sup>×</sup> <sup>Q</sup><sup>ˆ</sup> with <sup>A</sup>μ,d <sup>≤</sup> , and intersects them. The resulting automaton *DimWithWitness* accepts word (w, wt<sup>P</sup> , id<sup>P</sup> , wtQ, idQ) iff *DS*(wt<sup>P</sup> , d) <sup>≤</sup> *DS*(wtQ, d). The projection of *DimWithWitness* on the first three components of Pˆ returns *Dim* which contains the word (w, wt<sup>P</sup> , id<sup>P</sup> ) iff it is a diminished run in P. Finally, language-equivalence between *Dim* and Pˆ returns the answer.

Unlike DetLP, BCV operates on incomplete DS-automata as well, and can be extended to DS-automata in which not all states are accepting.

# **4** QuIP**:** BCV**-based Solver for DS-inclusion**

We investigate more closely why BCV does not lend itself to a practical implementation for DS-inclusion (Sect. 4.1). We identify its drawbacks, and propose an improved algorithm QuIP as is described in Sect. 4.3. QuIP improves upon BCV by means of a new optimized comparator that we describe in Sect. 4.2.

### **4.1 Analysis of BCV**

The proof for PSPACE-complexity of BCV relies on LI to be PSPACE. In practice, though, implementations of LI apply Ramsey-based inclusion testing [6], rankbased methods [18] etc. All of these algorithms are exponential in time and space in the worst case. Any implementation of BCV will have to rely on an LI-solver. Therefore, in practice BCV is also exponential in time and space. In fact, we show that its worst-case complexity (in practice) is poorer than DetLP.

Another reason that prevents BCV from practical implementations is that it does not optimize the size of intermediate automata. Specifically, we show that the size and transition-density of *Dim*, which is one of the inputs to LIsolver, is very high (Transition density is the ratio of transitions to states). Both of these parameters are known to be deterrents to the performance of existing LI-solvers [5], subsequently to BCV as well:

**Lemma 2.** *Let* s<sup>P</sup> *,* sQ*,* s<sup>d</sup> *and* τ<sup>P</sup> *,* τQ*,* τ<sup>d</sup> *denote the number of states and transitions in* <sup>P</sup>*,* <sup>Q</sup>*, and* <sup>A</sup>μ,d <sup>≤</sup> *, respectively. Number of states and transitions in Dim are* <sup>O</sup>(s<sup>P</sup> <sup>s</sup>Qsd) *and* <sup>O</sup>(<sup>τ</sup> <sup>2</sup> <sup>P</sup> τ <sup>2</sup> <sup>Q</sup>τ<sup>d</sup>|Σ|)*, respectively.*

*Proof.* It is easy to see that the number of states and transitions of Pˆ Qˆ are the same as those of P and Q, respectively. Therefore, the number of states and transitions in <sup>P</sup>ˆ×Q<sup>ˆ</sup> are <sup>O</sup>(s<sup>P</sup> <sup>s</sup>Q) and <sup>O</sup>(τ<sup>P</sup> <sup>τ</sup>Q), respectively. The alphabet of <sup>P</sup>ˆ<sup>×</sup> <sup>Q</sup><sup>ˆ</sup> is of the form (a, wt1, id1, wt2, id2) for <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>, wt1, wt<sup>2</sup> are non-negative weights bounded by μ and id<sup>i</sup> are unique transition-ids in P and Q respectively. The alphabet of comparator <sup>A</sup>μ,d <sup>≤</sup> is of the form (wt1, wt2). To perform intersection of these two, the alphabet of comparator needs to be matched to that of the product, causing a blow-up in number of transitions in the comparator by a factor of <sup>|</sup>Σ|·τ<sup>P</sup> ·τQ. Therefore, the number of states and transitions in *DimWithWitness* and *Dim* is given by <sup>O</sup>(s<sup>P</sup> <sup>s</sup>Qsd) and <sup>O</sup>(<sup>τ</sup> <sup>2</sup> <sup>P</sup> τ <sup>2</sup> <sup>Q</sup>τ<sup>d</sup>|Σ|).

The comparator is a non-deterministic B¨uchi automata with <sup>O</sup>(μ<sup>2</sup>) states over an alphabet of size <sup>μ</sup><sup>2</sup> [9]. Since transition-density <sup>δ</sup> <sup>=</sup> <sup>|</sup>S|·|Σ<sup>|</sup> for nondeterministic B¨uchi automata, the transition-density of the comparator is <sup>O</sup>(μ<sup>4</sup>). Therefore,

**Corollary 1.** *Let* <sup>s</sup><sup>P</sup> *,* <sup>s</sup>Q*,* <sup>s</sup><sup>d</sup> *denote the number of states in* <sup>P</sup>*,* <sup>Q</sup>*,* <sup>A</sup>μ,d <sup>≤</sup> *, respectively, and* δ<sup>P</sup> *,* δ<sup>Q</sup> *and* δ<sup>d</sup> *be their transition-densities. Number of states and transition-density of Dim are* <sup>O</sup>(s<sup>P</sup> <sup>s</sup>Qμ<sup>2</sup>) *and* <sup>O</sup>(δ<sup>P</sup> <sup>δ</sup>Qτ<sup>P</sup> <sup>τ</sup><sup>Q</sup> · <sup>μ</sup><sup>4</sup> · |Σ|)*, respectively.*

The corollary illustrates that the transition-density of *Dim* is very high even for small inputs. The blow-up in number of transitions of *DimWithWitness* (hence *Dim*) occurs during alphabet-matching for B¨uchi automata intersection (Algorithm 1, Line 8). However, the blow-up can be avoided by performing intersection over a substring of the alphabet of <sup>P</sup><sup>ˆ</sup> <sup>×</sup> <sup>Q</sup>ˆ. Specifically, if <sup>s</sup><sup>1</sup> (a,n*<sup>P</sup>* ,id*<sup>P</sup>* ,n*Q*,id*Q*) −−−−−−−−−−−−→ <sup>s</sup><sup>2</sup> and t<sup>1</sup> (wt1,wt2) −−−−−−→ <sup>t</sup><sup>2</sup> are transitions in <sup>P</sup><sup>ˆ</sup> <sup>×</sup> <sup>Q</sup><sup>ˆ</sup> and comparator <sup>A</sup>μ,d <sup>≤</sup> respectively, then (s1, t1, i) (a,n*<sup>P</sup>* ,id*<sup>P</sup>* ,n*Q*,id*Q*) −−−−−−−−−−−−→ (s2, t2, j) is a transition in the intersection iff n<sup>P</sup> = wt<sup>1</sup> and n<sup>Q</sup> = wt2, where j = (i+ 1) mod 2 if either s<sup>1</sup> or t<sup>1</sup> is an accepting state, and j = i otherwise. We call intersection over substring of alphabet IntersectSelectAlpha. The following is easy to prove:

**Lemma 3.** *Let* <sup>A</sup><sup>1</sup> <sup>=</sup> Intersect(P<sup>ˆ</sup> <sup>×</sup>Q, <sup>ˆ</sup> <sup>A</sup>μ,d <sup>≤</sup> )*, and* <sup>A</sup><sup>2</sup> <sup>=</sup> IntersectSelectAlpha(P<sup>ˆ</sup> <sup>×</sup> Q, <sup>ˆ</sup> <sup>A</sup>μ,d <sup>≤</sup> )*.* Intersect *extends alphabet of* <sup>A</sup>μ,d <sup>≤</sup> *to match the alphabet of* <sup>P</sup><sup>ˆ</sup> <sup>×</sup> <sup>Q</sup><sup>ˆ</sup> *and* IntersectSelectAlpha *selects a substring of the alphabet of* <sup>P</sup><sup>ˆ</sup> <sup>×</sup>Q<sup>ˆ</sup> *as defined above. Then,* L(A1) ≡ L(A2)*.*

IntersectSelectAlpha prevents the blow-up by <sup>|</sup>Σ| · <sup>τ</sup><sup>P</sup> · <sup>τ</sup>Q, resulting in only <sup>O</sup>(τ<sup>P</sup> <sup>τ</sup>Qτd) transitions in *Dim* Therefore,

**Lemma 4** [Trans. Den. in BCV]*. Let* <sup>δ</sup><sup>P</sup> *,* <sup>δ</sup><sup>Q</sup> *denote transition-densities of* <sup>P</sup> *and* <sup>Q</sup>*, resp., and* <sup>μ</sup> *be the upper bound for comparator* <sup>A</sup>μ,d <sup>≤</sup> *. Number of states and transition-density of Dim are* <sup>O</sup>(s<sup>P</sup> <sup>s</sup>Qμ<sup>2</sup>) *and* <sup>O</sup>(δ<sup>P</sup> <sup>δ</sup><sup>Q</sup> · <sup>μ</sup><sup>4</sup>)*, respectively.*

Language-equivalence is performed via tools for language-inclusion. The most effective tool for language-inclusion RABIT [1] is based on Ramsay-based inclusion testing [6]. The worst-case complexity for <sup>A</sup> <sup>⊆</sup> <sup>B</sup> via Ramsay-based inclusion testing is known to be 2O(n2) , when B has n states. Therefore,

**Theorem 3** [Practical complexity of BCV]*. Let* <sup>P</sup> *and* <sup>Q</sup> *be DS-automata with* s<sup>P</sup> *,* s<sup>Q</sup> *number of states respectively, and maximum weight on transitions be* μ*. Worst-case complexity for* BCV *for integer discount-factor* d > <sup>1</sup> *when languageequivalence is performed via Ramsay-based inclusion testing is* 2O(s<sup>2</sup> *<sup>P</sup>* ·s<sup>2</sup> *<sup>Q</sup>*·μ4) *.*

Recall that language-inclusion queries are <sup>P</sup><sup>ˆ</sup> <sup>⊆</sup> *Dim* and *Dim* <sup>⊆</sup> <sup>P</sup>ˆ. Since *Dim* has many more states than <sup>P</sup>ˆ, the complexity of <sup>P</sup><sup>ˆ</sup> <sup>⊆</sup> *Dim* dominates.

Theorems <sup>2</sup> and <sup>3</sup> demonstrate that the complexity of BCV (in practice) is worse than DetLP.

### **4.2 Baseline Automata: An Optimized Comparator**

The 2O(s2) dependence of BCV on the number of states <sup>s</sup> of the comparator motivates us to construct a more compact comparator. Currently a comparator consists of <sup>O</sup>(μ<sup>2</sup>) number of states for upper bound <sup>μ</sup> [9]. In this section, we introduce the related concept of *baseline automata* which consists of only <sup>O</sup>(μ) many states and has transition density of <sup>O</sup>(μ<sup>2</sup>).

**Definition 1 (Baseline automata).** *For natural number* μ*, integer discountfactor* d > <sup>1</sup> *and relation* <sup>R</sup>*, for* <sup>R</sup> ∈ {≤, <sup>≥</sup>, <, >, <sup>=</sup>}*, the* DSbaseline automata Bμ,d <sup>R</sup> *, baseline in short, is a B¨uchi automaton that accepts (infinite) words over the alphabet* {−(<sup>μ</sup> <sup>−</sup> 1),...,μ <sup>−</sup> <sup>1</sup>} *s.t. sequences* <sup>V</sup> ∈ L(Bμ,d <sup>R</sup> ) *iff DS*(V, d) R 0*.*

Semantically, a baseline automata with upper bound μ, discount-factor d and inequality relation R is the language of all integer sequences bounded by μ for which their discounted-sum is related to 0 by the relation R. Baseline automata can also be said to be related to *cut-point languages* [14].

Since *DS*(A, d) <sup>≤</sup> *DS*(B, d) = *DS*(<sup>A</sup> <sup>−</sup> B, d) <sup>≤</sup> 0, <sup>A</sup>μ,d <sup>≤</sup> accepts (A, B) iff <sup>B</sup>μ,d ≤ accepts (<sup>A</sup> <sup>−</sup> <sup>B</sup>), regularity of baseline automata follows straight-away from the regularity of comparator. In fact, the automaton for <sup>B</sup>μ,d <sup>≤</sup> can be derived from Aμ,d <sup>≤</sup> by transforming the alphabet from (a, b) to (<sup>a</sup> <sup>−</sup> <sup>b</sup>) along every transition. The first benefit of the modified alphabet is that its size is reduced from μ<sup>2</sup> to <sup>2</sup> · <sup>μ</sup> <sup>−</sup> 1. In addition, it coalesces all transitions between any two states over alphabet (a, a + v), for all a, into one single transition over v, thereby also reducing transitions. However, this direct transformation results in a baseline with <sup>O</sup>(μ<sup>2</sup>) states. We provide a construction of baseline with <sup>O</sup>(μ) states only.

The key idea behind the construction of the baseline is that the discountedsum of sequence V can be treated as a number in base d i.e. *DS*(V, d) = Σ<sup>∞</sup> i=0 V [i] <sup>d</sup>*<sup>i</sup>* = (V [0].V [1]V [2] ...)d. So, there exists a non-negative value C in base d s.t. V + C = 0 for arithmetic operations in base d. This value C can be represented by a non-negative sequence C s.t. *DS*(C, d) + *DS*(V, d) = 0. Arithmetic in base d over sequences C and V result in a sequence of carry-on X such that:

**Lemma 5.** *Let* V, C, X *be the number sequences,* d > 1 *be a positive integer such that following equations holds true:*

*1. When* i = 0*,* V [0] + C[0] + X[0] = 0

*2. When* <sup>i</sup> <sup>≥</sup> <sup>1</sup>*,* <sup>V</sup> [i] + <sup>C</sup>[i] + <sup>X</sup>[i] = <sup>d</sup> · <sup>X</sup>[<sup>i</sup> <sup>−</sup> 1]

*Then DS*(V, d) + *DS*(C, d)=0*.*

In the construction of the comparator, it has been proven that when A and <sup>B</sup> are bounded non-negative integer sequences s.t. *DS*(A, d) <sup>≤</sup> *DS*(B, d), the corresponding sequences C and X are also bounded integer-sequences [9]. The same argument transcends here: When V is a bounded integer sequence s.t. *DS*(V, d) <sup>≤</sup> 0, there exists a corresponding pair of bounded integer sequence <sup>C</sup> and X. In fact, the bounds used for the comparator carry over to this case as well. Sequence <sup>C</sup> is non-negative and is bounded by <sup>μ</sup><sup>C</sup> <sup>=</sup> <sup>μ</sup> · <sup>d</sup> <sup>d</sup>−<sup>1</sup> since <sup>−</sup>μ<sup>C</sup> is the minimum value of discounted-sum of V , and integer-sequence X is bounded by μ<sup>X</sup> = 1+ <sup>μ</sup> <sup>d</sup>−<sup>1</sup> . On combining Lemma <sup>5</sup> with the bounds on <sup>X</sup> and <sup>C</sup> we get:

**Lemma 6.** *Let* <sup>V</sup> *and be an integer-sequence bounded by* <sup>μ</sup> *s.t. DS*(V, d) <sup>≤</sup> <sup>0</sup>*, and* X *be an integer sequence bounded by* (1 + <sup>μ</sup> <sup>d</sup>−<sup>1</sup> )*, then there exists an* <sup>X</sup> *s.t.*


Equations 1–2 from Lemma 6 have been obtained by expressing C[i] in terms of <sup>X</sup>[i], <sup>X</sup>[i−1], <sup>V</sup> [i] and <sup>d</sup>, and imposing the non-negative bound of <sup>μ</sup><sup>C</sup> <sup>=</sup> <sup>μ</sup>· <sup>d</sup> <sup>d</sup>−<sup>1</sup> on the resulting expression. Therefore, Lemma 6 implicitly captures the conditions on <sup>C</sup> by expressing it only in terms of <sup>V</sup> , <sup>X</sup> and <sup>d</sup> for *DS*(V, d) <sup>≤</sup> 0 to hold.

In construction of the baseline automata, the values of V [i] is part of the alphabet, upper bound μ and discount-factor d are the input parameters. The only unknowns are the value of X[i]. However, we know that it can take only finitely many values i.e. integer values <sup>|</sup>X[i]| ≤ <sup>μ</sup>X. So, we store all possible values of <sup>X</sup>[i] in the states. Hence, the state-space <sup>S</sup> comprises of {(x)||x| ≤ <sup>μ</sup><sup>X</sup>} and a start state s. Transitions between these states are possible iff the corresponding x-values and alphabet v satisfy the conditions of Eqs. 1–2 from Lemma 6. There is a transition from start state <sup>s</sup> to state (x) on alphabet <sup>v</sup> if 0 ≤ −(x+v) <sup>≤</sup> <sup>μ</sup>· <sup>d</sup> <sup>d</sup>−<sup>1</sup> , and from state (x) to state (x ) on alphabet <sup>v</sup> if 0 <sup>≤</sup> (<sup>d</sup> ·x−<sup>v</sup> <sup>−</sup>x ) <sup>≤</sup> <sup>μ</sup>· <sup>d</sup> <sup>d</sup>−<sup>1</sup> . All (x)-states are accepting. This completes the construction for baseline automaton Bμ,d <sup>≤</sup> . Clearly <sup>B</sup>μ,d <sup>≤</sup> has only <sup>O</sup>(μ) states.

Since B¨uchi automata are closed under set-theoretic operations, baseline automata is ω-regular for all other inequalities too. Moreover, baseline automata for all other inequalities also have <sup>O</sup>(μ) states. Therefore for sake of completion, we extend <sup>B</sup>μ,d <sup>≤</sup> to construct <sup>B</sup>μ,d <sup>&</sup>lt; . For *DS*(V, d) < 0, *DS*(C, d) > 0 (implicitly generated C). Since C is a non-negative sequence it is sufficient if at least one value of C is non-zero. Therefore, all runs are diverted to non-accepting states (x, <sup>⊥</sup>) using the same transitions until the value of <sup>c</sup> is zero, and moves to accepting states (x) only if it witnesses a non-zero value for c. Formally,

**Construction.** Let <sup>μ</sup><sup>C</sup> <sup>=</sup> <sup>μ</sup> · <sup>d</sup> <sup>d</sup>−<sup>1</sup> <sup>≤</sup> <sup>2</sup> · <sup>μ</sup> and <sup>μ</sup><sup>X</sup> = 1+ <sup>μ</sup> <sup>d</sup>−<sup>1</sup> . <sup>B</sup>μ,d <sup>&</sup>lt; = (*S*,Σ,δd,*Init*, <sup>F</sup>)

	-
	- i. (s, v, x) for all <sup>x</sup> ∈ F s.t. 0 <sup>&</sup>lt; <sup>−</sup>(<sup>x</sup> <sup>+</sup> <sup>v</sup>) <sup>≤</sup> <sup>μ</sup><sup>C</sup>
	- ii. (s, v,(x, <sup>⊥</sup>)) for all (x, <sup>⊥</sup>) <sup>∈</sup> <sup>S</sup><sup>⊥</sup> s.t. <sup>x</sup> <sup>+</sup> <sup>v</sup> = 0 2. Transitions within <sup>S</sup>⊥: ((x, <sup>⊥</sup>), v,(x , <sup>⊥</sup>)) for all (x, <sup>⊥</sup>), (x , <sup>⊥</sup>) <sup>∈</sup> <sup>S</sup>⊥, if <sup>d</sup> · <sup>x</sup> <sup>=</sup> <sup>v</sup> <sup>+</sup> <sup>x</sup>
	- 3. Transitions within <sup>F</sup>: (x, v, x ) for all x, x ∈ F if 0 <sup>≤</sup> <sup>d</sup> · <sup>x</sup> <sup>−</sup> <sup>v</sup> <sup>−</sup> <sup>x</sup> < d
	- 4. Transition between <sup>S</sup><sup>⊥</sup> and <sup>F</sup>: ((x, <sup>⊥</sup>), v, x ) for (x, <sup>⊥</sup>) <sup>∈</sup> <sup>S</sup>⊥, <sup>x</sup> ∈ F if <sup>0</sup> < d · <sup>x</sup> <sup>−</sup> <sup>v</sup> <sup>−</sup> <sup>x</sup> < d

**Theorem 4** [Baseline]*. The B¨uchi automaton constructed above is the baseline* Bμ,d <sup>&</sup>lt; *with upper bound* μ*, integer discount-factor* d > 1 *and relation* <*.*

The baseline automata for all inequality relations will have <sup>O</sup>(μ) states, alphabet size of 2 · <sup>μ</sup> <sup>−</sup> 1, and transition-density of <sup>O</sup>(μ<sup>2</sup>).


```
Algorithm 2. QuIP(P, Q, d), Is P ⊆d Q?
```
### **4.3 QuIP: Algorithm Description**

The construction of the universal leads to an implementation-friendly QuIP from BCV. The core focus of QuIP is to ensure that the size of intermediate automata is small and they have fewer transitions to assist the LI-solvers. Technically, QuIP differs from BCV by incorporating the baseline automata and an appropriate IntersectSelectAlpha function, rendering QuIP theoretical improvement over BCV. Like BCV, QuIP also determines all diminished runs of <sup>P</sup>. So, it disambiguates P by appending weight and a unique label to each of its transitions. Since, the identity of runs of Q is not important, we do not disambiguate between runs of Q, we only append the weight to each transition (Algorithm 2, Line 4). The baseline automaton is constructed for discount-factor d, maximum weight <sup>μ</sup> along transitions in <sup>P</sup> and <sup>Q</sup>, and the inequality <sup>≤</sup>. Since the alphabet of the baseline automata are integers between <sup>−</sup><sup>μ</sup> to <sup>μ</sup>, the alphabet of the product <sup>P</sup><sup>ˆ</sup> <sup>×</sup>Q<sup>ˆ</sup> is adjusted accordingly. Specifically, the weight recorded along transitions in the product is taken to be the difference of weight in Pˆ to that in Qˆ i.e. if τ<sup>P</sup> : s1 <sup>a</sup>1,wt1,l −−−−−→ <sup>s</sup><sup>2</sup> and <sup>τ</sup><sup>Q</sup> : <sup>t</sup><sup>1</sup> a2,wt<sup>2</sup> −−−−→ <sup>t</sup><sup>2</sup> are transitions in <sup>P</sup><sup>ˆ</sup> and <sup>Q</sup><sup>ˆ</sup> respectively, then <sup>τ</sup> = (s1, t1) <sup>a</sup>1,wt1−wt2,l −−−−−−−−→ (s2, t2) is a transition in <sup>P</sup>ˆ×Q<sup>ˆ</sup> iff <sup>a</sup><sup>1</sup> <sup>=</sup> <sup>a</sup><sup>2</sup> (Algorithm 2, Line 5). In this case, IntersectSelectAlpha intersects baseline automata <sup>A</sup> and product <sup>P</sup><sup>ˆ</sup> <sup>×</sup>Q<sup>ˆ</sup> only on the weight-component of alphabet in <sup>P</sup><sup>ˆ</sup> <sup>×</sup>Qˆ. Specifically, if s<sup>1</sup> (a,wt1,l) −−−−−→ <sup>s</sup><sup>2</sup> and <sup>t</sup><sup>1</sup> wt<sup>2</sup> −−→ <sup>t</sup><sup>2</sup> are transitions in <sup>P</sup><sup>ˆ</sup> <sup>×</sup> <sup>Q</sup><sup>ˆ</sup> and comparator <sup>A</sup>μ,d ≤ respectively, then (s1, t1, i) a,wt1,l −−−−→ (s2, t2, j) is a transition in the intersection iff wt<sup>1</sup> = wt2, where j = (i + 1) mod 2 if either s<sup>1</sup> or t<sup>1</sup> is an accepting state, and <sup>j</sup> <sup>=</sup> <sup>i</sup> otherwise. Automaton *Dim* and <sup>P</sup>ˆ−wt are obtained by project out the weight-component from the alphabet of <sup>P</sup><sup>ˆ</sup> <sup>×</sup>Q<sup>ˆ</sup> and <sup>P</sup><sup>ˆ</sup> respectively. The alphabet of <sup>P</sup><sup>ˆ</sup> <sup>×</sup> <sup>Q</sup><sup>ˆ</sup> and <sup>P</sup><sup>ˆ</sup> are converted from (a, wt, l) to only (a, l). It is necessary to project out the weight component since in <sup>P</sup><sup>ˆ</sup> <sup>×</sup> <sup>Q</sup><sup>ˆ</sup> they represent the difference of weights and in Pˆ they represent the absolute value of weight.

Finally, the language of *Dim* is equated with that of <sup>P</sup>ˆ−wt which is the automaton generated from Pˆ after discarding weights from transitions. However, it is easy to prove that *Dim* <sup>⊆</sup> <sup>P</sup>ˆ−wt. Therefore, instead of language-equivalence between *Dim* and <sup>P</sup>ˆ−wt and, it is sufficient to check whether <sup>P</sup>ˆ−wt <sup>⊆</sup> *Dim*. As a result, QuIP utilizes LI-solvers as a black-box to perform this final step.

**Lemma 7** [Trans. Den. in *QuIP*]*. Let* δ<sup>P</sup> *,* δ<sup>Q</sup> *denote transition-densities of* P *and* <sup>Q</sup>*, resp., and* <sup>μ</sup> *be the upper bound for baseline* <sup>B</sup>μ,d <sup>≤</sup> *. Number of states and transition-density of Dim are* <sup>O</sup>(s<sup>P</sup> <sup>s</sup>Qμ) *and* <sup>O</sup>(δ<sup>P</sup> <sup>δ</sup><sup>Q</sup> · <sup>μ</sup><sup>2</sup>)*, respectively.*

**Theorem 5** [Practical complexity of *QuIP*]*. Let* P *and* Q *be DS-automata with* s<sup>P</sup> *,* s<sup>Q</sup> *number of states, respectively, and maximum weight on transitions be* μ*. Worst-case complexity for QuIP for integer discount-factor* d > 1 *when languageequivalence is performed via Ramsay-based inclusion testing is* 2O(s<sup>2</sup> *<sup>P</sup>* ·s<sup>2</sup> *<sup>Q</sup>*·μ2) *.*

Theorem <sup>5</sup> demonstrates that while complexity of QuIP (in practice) improves upon BCV (in practice), it is still worse than DetLP.

# **5 Experimental Evaluation**

We provide implementations of our tools QuIP and DetLP and conduct experiments on a large number of synthetically-generated benchmarks to compare their performance. We seek to find answers to the following questions: (1). Which tool has better performance, as measured by runtime, and number of benchmarks solved? (2). How does change in transition-density affect performance of the tools? (3). How dependent are our tools on their underlying solvers?

### **5.1 Implementation Details**

We implement our tools QuIP and DetLP in C++, with compiler optimization o3 enabled. We implement our own library for all B¨uchi-automata and DSautomata operations, except for language-inclusion for which we use the stateof-the-art LI-solver RABIT [4] as a black-box. We enable the -fast flag in RABIT, and tune its JAVA-threads with Xss, Xms, Xmx set to 1 GB, 1 GB and 8 GB respectively. We use the large-scale LP-solver GLPSOL provided by GLPK (GNU Linear Programming Kit) [2] inside DetLP. We did not tune GLPSOL since it consumes a very small percentage of total time in DetLP, as we see later in Fig. 4.

We also employ some implementation-level optimizations. Various steps of QuIP and DetLP such as product, DS-determinization, baseline construction, involve the creation of new automaton states and transitions. We reduce their size by adding a new state only if it is reachable from the initial state, and a new transition only if it originates from such a state.

The universal automata is constructed on the restricted alphabet of only those weights that appear in the product <sup>P</sup><sup>ˆ</sup> <sup>×</sup> <sup>Q</sup><sup>ˆ</sup> to include only necessary transitions. We also reduce its size with B¨uchi minimization tool Reduce [4].

Since all states of <sup>P</sup><sup>ˆ</sup> <sup>×</sup> <sup>Q</sup><sup>ˆ</sup> are accepting, we conduct the intersection so that it avoids doubling the number of product states. This can be done, since it is sufficient to keep track of whether words visit accepting states in the universal.

### **5.2 Benchmarks**

To the best of our knowledge, there are no standardized benchmarks for DSautomata. We attempted to experimented with examples that appear in research papers. However, these examples are too few and too small, and do not render an informative view of performance of the tools. Following a standard approach to performance evaluation of automata-theoretic tools [5,19,22], we experiment with our tools on *randomly generated* benchmarks.

**Random Weighted-Automata Generation.** The parameters for our random weighted-automata generation procedure are the number of states N, transition-density δ and upper-bound μ for weight on transitions. The states are represented by the set {0, <sup>1</sup>,...,N <sup>−</sup>1}. All states of the weighted-automata are accepting, and they have a unique initial state 0. The alphabet for all weightedautomata is fixed to <sup>Σ</sup> <sup>=</sup> {a, b}. Weight on transitions ranges from 0 to <sup>μ</sup>−1. For our experiments we only generate complete weighted-automata. These weighted automata are generated only if the number of transitions <sup>N</sup> · <sup>δ</sup> is greater than <sup>N</sup> · |Σ|, since there must be at least one transition on each alphabet from every state. We first complete the weighted-automata by creating a transition from each state on every alphabet. In this case the destination state and weight are chosen randomly. The remaining (<sup>N</sup> ·|Σ|−<sup>N</sup> ·δ)-many transitions are generated by selecting all parameters randomly i.e. the source and destination states from {0,...N <sup>−</sup> <sup>1</sup>}, the alphabet from <sup>Σ</sup>, and weight on transition from {0, μ <sup>−</sup> <sup>1</sup>}.

### **5.3 Design and Setup for Experimental Evaluation**

Our experiments were designed with the objective to compare DetLP and QuIP. Due to the lack of standardized benchmarks, we conduct our experiments on randomly-generated benchmarks. Therefore, the parameters for <sup>P</sup> <sup>⊆</sup><sup>d</sup> <sup>Q</sup> are the number of states s<sup>P</sup> and sQ, transition density δ, and maximum weight wt. We seek to find answers to the questions described at the beginning of Sect. 5.

Each instantiation of the parameter-tuple (s<sup>P</sup> , sQ, δ, wt) and a choice of tool between QuIP and DetLP corresponds to one experiment. In each experiment, the weighted-automata P and Q are randomly-generated with the parameters (s<sup>P</sup> , δ, wt) and (sQ, δ, wt), respectively, and language-inclusion is performed by the chosen tool. Since all inputs are randomly-generated, each experiment is repeated for 50 times to obtain statistically significant data. Each experiment is run for a total of 1000 sec on for a single node of a high-performance cluster. Each node of the cluster consists of two quad-core Intel-Xeon processor running at 2.83 GHz, with 8 GB of memory per node. The runtime of experiments that do not terminate within the given time limit is assigned a runtime of ∞. We report the median of the runtime-data collected from all iterations of the experiment.

These experiments are scaled-up by increasing the size of inputs. The worstcase analysis of QuIP demonstrates that it is symmetric in <sup>s</sup><sup>P</sup> and <sup>s</sup>Q, making the algorithm impartial to which of the two inputs is scaled (Theorem 5). On the other hand, complexity of DetLP is dominated by <sup>s</sup><sup>Q</sup> (Theorem 2). Therefore, we scale-up our experiments by increasing s<sup>Q</sup> only.

Since DetLP is restricted to complete automata, these experiments are conducted on complete weighted automata only. We collect data on total runtime of each tool, the time consumed by the underlying solver, and the number of times each experiment terminates with the given resources. We experiment with s<sup>P</sup> = 10, δ ranges between 2.5–4 in increments of 0.5 (we take lower-bound of 2.5 since <sup>|</sup>Σ<sup>|</sup> = 2), wt ∈ {4, <sup>5</sup>}, and <sup>s</sup><sup>Q</sup> ranges from 0–1500 in increments of 25, d = 3. These sets of experiments also suffice for testing scalability of both tools.

### **5.4 Observations**

We first compare the tools based on the number of benchmarks each can solve. We also attempt to unravel the main cause of failure of each tool. Out of the 50 experiments for each parameter-value, DetLP consistently solves more benchmarks than QuIP for the same parameter-values (Fig. 3a–b)<sup>1</sup>. The figures also reveal that both tools solve more benchmarks at lower transition-density. The most common, in fact almost always, reason for QuIP to fail before its timeout was reported to be memory-overflow inside RABIT during languageinclusion between <sup>P</sup>ˆ−wt and *Dim*. On the other hand, the main cause of failure of DetLP was reported to be memory overflow during DS-determinization and preprocessing of the determinized DS-automata before GLPSOL is invoked. This occurs due to the sheer size of the determinized DS-automata, which can very quickly become very large. These empirical observations indicate that the bottleneck in QuIP and DetLP may be language-inclusion and explicit DSdeterminization, respectively.

We investigate the above intuition by analyzing the runtime trends for both tools. Figure 4a plots the runtime for both tools. The plot shows that QuIP fares significantly better than DetLP in runtime at <sup>δ</sup> = 2.5. The plots for both the tools on logscale seem curved (Fig. 4a), suggesting a sub-exponential runtime complexity. These were observed at higher δ as well. However, at higher δ we observe very few outliers on the runtime-trend graphs of QuIP at larger inputs when just a few more than 50% of the runs are successful. This is expected since effectively, the median reports the runtime of the slower runs in these cases. Figure 4b records the ratio of total time spent inside RABIT and GLPSOL. The plot reveals that QuIP spends most of its time inside RABIT. We also observe that most memory consumptions in QuIP occurs inside RABIT. In contrast, GLPSOL consumes a negligible amount of time and memory in DetLP. Clearly, performance of QuIP and DetLP is dominated by RABIT and explicit DS-determinization, respectively. We also determined how runtime performance of tools changes with increasing discount-factor d. Both tools consume lesser time as d increases.

Finally, we test for scalability of both tools. In Fig. 5a, we plot the median of total runtime as <sup>s</sup><sup>Q</sup> increases at <sup>δ</sup> = 2.5, 3 (s<sup>P</sup> = 10, μ = 4) for QuIP. We attempt to best-fit the data-points for each δ with functions that are linear, quadratic and cubic in s<sup>Q</sup> using squares of residuals method. Figure 5b does the same for

<sup>1</sup> Figures are best viewed online and in color.

**Fig. 3.** Number of benchmarks solved out of 50 as <sup>s</sup>*<sup>Q</sup>* increases with <sup>s</sup>*<sup>P</sup>* = 10, <sup>μ</sup> = 4. δ = 2.5 and δ = 4 in Fig. 3a and b, respectively.

**Fig. 4.** Time trends: Fig. 4a plots total runtime as <sup>s</sup>*<sup>Q</sup>* increases <sup>s</sup>*<sup>P</sup>* = 10,<sup>μ</sup> = 4, <sup>δ</sup> = 2.5. Figure shows median-time for each parameter-value. Figure 4b plots the ratio of time spent by tool inside its solver at the same parameter values.

DetLP. We observe that QuIP and DetLP are best fit by functions that are linear and quadratic in sQ, respectively.

**Inferences and Discussion.** Our empirical analysis arrives at conclusions that a purely theoretical exploration would not have. First of all, we observe that despite having a the worse theoretical complexity, the median-time complexity of QuIP is better than DetLP by an order of <sup>n</sup>. In theory, QuIP scales exponentially in <sup>s</sup>Q, but only linearly in <sup>s</sup><sup>Q</sup> in runtime. Similarly, runtime of DetLP scales quadratically in sQ. The huge margin of complexity difference emphasizes why solely theoretical analysis of algorithms is not sufficient.

Earlier empirical analysis of LI-solvers had made us aware of their dependence on transition-density <sup>δ</sup>. As a result, we were able to design QuIP cognizant of parameter δ. Therefore, its runtime dependence on δ is not surprising. However, our empirical analysis reveals runtime dependence of DetLP on <sup>δ</sup>. This is unexpected since δ does not appear in any complexity-theoretic analysis of DetLP (Theorem 1). We suspect this behavior occurs because the creation of each transition, say on alphabet <sup>a</sup>, during DS-determinization requires the procedure to analyze every transition on alphabet <sup>a</sup> in the original DS-automata.

**Fig. 5.** Scalability of QuIP (Fig. 5a) and DetLP (Fig. 5b) at <sup>δ</sup> = 2.5, 3. Figures show median-time for each parameter-value.

Higher the transition-density, more the transitions in the original DS-automata, hence more expensive is the creation of transitions during DS-determinization.

We have already noted that the performance of QuIP is dominated by RABIT in space and time. Currently, RABIT is implemented in Java. Although RABIT surpasses all other LI-solvers in overall performance, we believe it can be improved significantly via a more space-efficient implementation in a more performance-oriented language like C++. This would, in-turn, enhance QuIP.

The current implementation of DetLP utilizes the vanilla algorithm for DSdeterminization. Since DS-determinization dominates DetLP, there is certainly merit in designing efficient algorithms for DS-determinization. However, we suspect this will be of limited advantage to DetLP since it will persist to incur the complete cost of explicit DS-determinization due to the separation of automatatheoretic and numeric reasoning.

Based on our observations, we propose to extract the complementary strengths of both tools: First, apply QuIP with a small timeout; Since DetLP solves more benchmarks, apply DetLP only if QuIP fails.

### **6 Concluding Remarks and Future Directions**

This paper presents the first empirical evaluation of algorithms and tools for DSinclusion. We present two tools DetLP and QuIP. Our first tool DetLP is based on explicit DS-determinization and linear programming, and renders an exponential time and space algorithm. Our second tool QuIP improves upon a previously known comparator-based automata-theoretic algorithm BCV by means of an optimized comparator construction, called universal automata. Despite its PSPACE-complete theoretical complexity, we note that all practical implementations of QuIP are also exponential in time and space.

The focus of this work is to investigate these tools in practice. In theory, the exponential complexity of QuIP is worse than DetLP. Our empirical evaluation reveals the opposite: The median-time complexity of QuIP is better than DetLP by an order of <sup>n</sup>. Specifically, QuIP scales linearly while DetLP scales quadratically in the size of inputs. This re-asserts the gap between theory and practice, and aserts the need of better metrics for practical algorithms. Further emprirical analysis by scaling the right-hand side automaton will be beneficial.

Nevertheless, DetLP consistently solves more benchmarks than QuIP. Most of QuIP's experiments fail due to memory-overflow within the LI-solver, indicating that more space-efficient implementations of LI-solvers would boost QuIP's performance. We are less optimistic about DetLP though. Our evaluation highlights the impediment of explicit DS-determinization, a cost that is unavoidable in DetLP's separation-of-concerns approach. This motivates future research that integrates automata-theoretic and numerical reasoning by perhaps combining implicit DS-determinzation with baseline automata-like reasoning to design an on-the-fly algorithm for DS-inclusion.

Last but not the least, our empirical evaluations lead to discovering dependence of runtime of algorithms on parameters that had not featured in their worst-case theoretical analysis, such as the dependence of DetLP on transitiondensity. Such evaluations build deeper understanding of algorithms, and will hopefully serve a guiding light for theoretical and empirical investigation intandem of algorithms for quantitative analysis

**Acknowledgements.** We thank anonymous reviewers for their comments. We thank K. S. Meel, A. A. Shrotri, L. M. Tabajara, and S. Zhu for helpful discussions. This work was partially supported by NSF Grant No. 1704883, "Formal Analysis and Synthesis of Multiagent Systems with Incentives".

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Model Checking Indistinguishability of Randomized Security Protocols**

Matthew S. Bauer1,4(B) , Rohit Chadha<sup>2</sup>, A. Prasad Sistla<sup>3</sup>, and Mahesh Viswanathan<sup>1</sup>

 University of Illinois at Urbana-Champaign, Champaign, USA msbauer2@illinois.edu University of Missouri, Columbia, USA University of Illinois at Chicago, Chicago, USA Galois Inc., Arlington, USA

**Abstract.** The design of security protocols is extremely subtle and vulnerable to potentially devastating flaws. As a result, many tools and techniques for the automated verification of protocol designs have been developed. Unfortunately, these tools don't have the ability to model and reason about protocols with randomization, which are becoming increasingly prevalent in systems providing privacy and anonymity guarantees. The security guarantees of these systems are often formulated by means of the indistinguishability of two protocols. In this paper, we give the first practical algorithms for model checking indistinguishability properties of randomized security protocols against the powerful threat model of a bounded Dolev-Yao adversary. Our techniques are implemented in the Stochastic Protocol ANalayzer (Span) and evaluated on several examples. As part of our evaluation, we conduct the first automated analysis of an electronic voting protocol based on the 3-ballot design.

# **1 Introduction**

Security protocols are highly intricate and vulnerable to design flaws. This has led to a significant effort in the construction of tools for the automated verification of protocol designs. In order to make automation feasible [8,12,15,23,34,48,55], the analysis is often carried out in the *Dolev-Yao* threat model [30], where the assumption of perfect cryptography is made. In the Dolev-Yao model, the omnipotent adversary has the ability to read, intercept, modify and replay all messages on public channels, remember the communication history as well as non-deterministically inject its own messages into the network while remaining anonymous. In this model, messages are symbolic terms modulo

M. S. Bauer and M. Viswanathan—Partially supported by grant NSF CNS 1314485. R. Chadha—Partially supported by grants NSF CNS 1314338 and NSF CNS 1553548.

A. Prasad Sistla—Partially supported by grants NSF CNS 1314485 and CCF 1564296.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 117–135, 2018. https://doi.org/10.1007/978-3-319-96142-2\_10

an equational theory (as opposed to bit-strings) and cryptographic operations are modeled via equations in the theory.

A growing number of security protocols employ randomization to achieve privacy and anonymity guarantees. Randomization is essential in protocols/systems for anonymous communication and web browsing such as Crowds [49], mixnetworks [21], onion routers [37] and Tor [29]. It is also used in fair exchange [11, 35], vote privacy in electronic voting [6,20,52,54] and denial of service prevention [40]. In the example below, we demonstrate how randomization is used to achieve privacy in electronic voting systems.

*Example 1.* Consider a simple electronic voting protocol for 2 voters Alice and Bob, two candidates and an election authority. The protocol is as follows. Initially, the election authority will generate two private tokens t<sup>A</sup> and t<sup>B</sup> and send them to Alice and Bob encrypted under their respective public keys. These tokens will be used by the voters as proofs of their eligibility. After receiving a token, each voter sends his/her choice to the election authority along with the proof of eligibility encrypted under the public key of the election authority. Once all votes have been collected, the election authority tosses a fair private coin. The order in which Alice and Bob's votes are published depends on the result of this coin toss. *Vote privacy* demands that an adversary not be able to deduce how each voter voted.

All the existing Dolev-Yao analysis tools are fundamentally limited to protocols that are purely non-deterministic, where non-determinism models concurrency as well as the interaction between protocol participants and their environment. There are currently no analysis tools that can faithfully reason about protocols like those in Example 1, a limitation that has long been identified by the verification community. In the context of electronic voting protocols, [28] identifies three main classes of techniques for achieving vote privacy; blind signature schemes, homomorphic encryption and randomization. There the authors concede that protocols based on the latter technique are "hard to address with our methods that are purely non-deterministic." Catherine Meadows, in her summary of the over 30 year history of formal techniques in cryptographic protocol analysis [46,47], identified the development of formal analysis techniques for anonymous communication systems, almost exclusively built using primitives with randomization, as a fundamental and still largely unsolved challenge. She writes, "it turned out to be difficult to develop formal models and analyses of large-scale anonymous communication. The main stumbling block is the threat model".

In this work, we take a major step towards overcoming this long-standing challenge and introduce the first techniques for automated Dolev-Yao analysis of randomized security protocols. In particular, we propose two algorithms for determining indistinguishability of randomized security protocols and implemented them in the Stochastic Protocol ANalyzer (Span). Several works [7,9,28,32,41] have identified indistinguishability as the natural mechanism to model security guarantees such as anonymity, unlinkability, and privacy. Consider the protocol from Example 1, designed to preserve vote privacy. Such a property holds if the executions of the protocol in which Alice votes for candidate 1 and Bob votes for candidate 2 cannot be distinguished from the executions of the protocol in which Alice votes for candidate 2 and Bob votes for candidate 1.

Observe that in Example 1, it is crucial that the result of the election authority's coin toss is not visible to the adversary. Indeed if the adversary is allowed to "observe" the results of private coin tosses, then the analysis may reveal "security flaws" in correct security protocols (see examples in [13,17,19,22,36]). Thus, many authors [10,13,17–19,22,26,36] have proposed that randomized protocols be analyzed with respect to adversaries that are forced to schedule the same action in any two protocol executions that are indistinguishable to them.

For randomized security protocols, [10,18,53] have proposed that trace equivalence from the applied π-calculus [5] serve as the indistinguishability relation on traces. In this framework, the protocol semantics are described by partially observable Markov decision processes (POMDPs) where the adversary's actions are modeled non-deterministically. The adversary is required to choose its next action based on the partial information that it can observe about the execution thus far. This allows us to model the privacy of coin tosses. Two security protocols are said to be indistinguishable [18,53] if their semantic descriptions as POMDPs are indistinguishable. Two POMDPs M and M are said to be indistinguishable if for any adversary <sup>A</sup> and trace <sup>o</sup>, the probability of the executions that generate the trace <sup>o</sup> with respect to <sup>A</sup> are the same for both <sup>M</sup> and <sup>M</sup> .

Our algorithms for indistinguishability in randomized security protocols are built on top of techniques for solving indistinguishability in finite POMDPs. Our first result shows that indistinguishability of finite POMDPs is **P**-complete. Membership in **P** is established by a reduction of POMDP indistinguishability to equivalence in probabilistic finite automata (PFAs), which is known to be **P**complete [31,45,57]. Further, we show that the hardness result continues to hold for acyclic POMDPs. An acyclic POMDP is a POMDP that has a set of "final" absorbing states and the only cycles in the underlying graph are self-loops on these states.

For acyclic finite POMDPs, we present another algorithm for checking indistinguishability based on the technique of translating a POMDP M into a fully observable Markov decision process (MDP), known as the belief MDP B(M) of M. It was shown in [14] that two POMDPs are indistinguishable if and only if the belief MDPs they induce are bisimilar as labeled Markov decision processes. When M is acylic and finite then its belief MDP B(M) is finite and acyclic and its bisimulation relation can be checked recursively.

Protocols in Span are described by a finite set of roles (agents) that interact asynchronously by passing messages. Each role models an agent in a protocol session and hence we only consider bounded number of sessions. An action in a role performs either a message input, or a message output or a test on messages. The adversary schedules the order in which these actions are executed and generates input recipes comprised of public information and messages previously output by the agents. In general, there are an unbounded number of input recipes available at each input step, resulting in POMDPs that are infinitely branching. Span, however, searches for bounded attacks by bounding the size of attacker messages. Under this assumption, protocols give rise to finite acyclic POMDPs. Even with this assumption, protocols specified in Span describe POMDPs that are exponentially larger than their description. Nevertheless, we show that when considering protocols defined over subterm convergent equational theories, indistinguishability of randomized security protocols is in **PSPACE** for bounded Dolev-Yao adversaries. We further show that the problem is harder than #SAT<sup>D</sup> and hence it is both **NP**-hard and **coNP**-hard.

The main engine of Span translates a randomized security protocol into an acyclic finite POMDP by recursively unrolling all protocol executions and grouping states according to those that are indistinguishable. We implemented two algorithms for checking indistinguishability in Span. The first algorithm, called the PFA algorithm, checks indistinguishability of P and P by converting them to corresponding PFAs A and A as in the proof of decidability of indistinguishability of finite POMDPs. PFA equivalence can then be solved through a reduction to linear programming [31]. The second algorithm, called the onthe-fly (OTF) algorithm, is based on the technique of checking bisimulation of belief MDPs. Although asymptotically less efficient than the PFA algorithm, the recursive procedure for checking bisimulation in belief MDPs can be embedded into the main engine of Span with little overhead, allowing one to analyze indistinguishability on-the-fly as the POMDP models are constructed.

In our evaluation of the indistinguishability algorithms in Span, we conduct the first automated Dolev-Yao analysis for several new classes of security protocols including dinning cryptographers networks [38], mix networks [21] and a 3-ballot electronic voting protocol [54]. The analysis of the 3-ballot protocol, in particular, demonstrates that our techniques can push symbolic protocol verification to new frontiers. The protocol is a full scale, real world example, which to the best of our knowledge, hasn't been analyzed using any existing probabilistic model checker or protocol analysis tool.

*Summary of Contributions.* We showed that the problem of checking indistinguishability of POMDPs is **P**-complete. The indistinguishability problem for bounded instances of randomized security protocols over subterm convergent equational theories (bounded number of sessions and bounded adversarial nondeterminism) is shown to be in **PSPACE** and #SAT<sup>D</sup>-hard. We proposed and implemented two algorithms in the Span protocol analysis tool for deciding indistinguishability in bounded instances of randomized security protocols and compare their performance on several examples. Using Span, we conducted the first automated verification of a 3-ballot electronic voting protocol.

*Related Work.* As alluded to above, techniques for analyzing security protocols have remained largely disjoint from techniques for analyzing systems with randomization. Using probabilistic model checkers such as PRISM [44], STORM [27] and APEX [42] some have attempted to verify protocols that explicitly employ randomization [56]. These ad-hoc techniques fail to capture powerful threat models, such as a Dolev-Yao adversary, and don't provide a general verification framework. Other works in the Dolev-Yao framework [28,43] simply abstract away essential protocol components that utilize randomization, such as anonymous channels. The first formal framework combining Dolev-Yao analysis with randomization appeared in [10], where the authors studied the conditions under which security properties of randomized protocols are preserved by protocol composition. In [53], the results were extended to indistinguishability.

Complexity-theoretic results on verifying secrecy and indistinguishability properties of bounded sessions of randomized security protocols against unbounded Dolev-Yao adverasries were studied in [18]. There the authors considered protocols with a fixed equational theory<sup>1</sup> and no negative tests (else branches). Both secrecy and indistinguishability were shown to be in **coNEXP-TIME**, with secrecy being **coNEXPTIME**-hard. The analogous problems for purely non-deterministic protocols are known to be **coNP**-complete [25,33,51]. When one fixes, a priori, the number of coin tosses, secrecy and indistinguishability in randomized protocols again become **coNP**-complete. In our asymptotic complexity results and in the Span tool, we consider a general class of equational theories and protocols that allow negative tests.

# **2 Preliminaries**

We assume that the reader is familiar with probability distributions. For a set <sup>X</sup>, Dist(X) shall denote the set of all discrete distributions <sup>μ</sup> on <sup>X</sup> such that <sup>μ</sup>(x) is a rational number for each <sup>x</sup> <sup>∈</sup> <sup>X</sup>. For <sup>x</sup> <sup>∈</sup> <sup>X</sup>, <sup>δ</sup><sup>x</sup> will denote the Dirac distribution, i.e., the measure μ such that μ(x) = 1. The *support* of a discrete distribution <sup>μ</sup>, denoted support(μ), is the set of all elements <sup>x</sup> such that <sup>μ</sup>(x) = 0.

**Markov Decision Processes (MDPs).** MDPs are used to model processes that exhibit both probabilistic and non-deterministic behavior. An MDP M is a tuple (Z, zs,Act, Δ) where <sup>Z</sup> is a countable set of states, <sup>z</sup><sup>s</sup> <sup>∈</sup> <sup>Z</sup> is the initial state, Act is a countable set of actions and <sup>Δ</sup> : <sup>Z</sup> <sup>×</sup> Act <sup>→</sup> Dist(Z) is the probabilistic transition function. <sup>M</sup> is said to be finite if the sets <sup>Z</sup> and Act are finite. An execution of an MDP is a sequence ρ = z<sup>0</sup> <sup>α</sup><sup>1</sup> −→ <sup>z</sup><sup>1</sup> <sup>α</sup><sup>2</sup> −→ · · · <sup>α</sup><sup>m</sup> −−→ <sup>z</sup><sup>m</sup> such that <sup>z</sup><sup>0</sup> <sup>=</sup> <sup>z</sup><sup>s</sup> and <sup>z</sup><sup>i</sup>+1 <sup>∈</sup> support(Δ(zi, α<sup>i</sup>+1)) for all <sup>i</sup> ∈ {0,...,m−1}. The *measure* of <sup>ρ</sup>, denoted prob<sup>M</sup>(ρ), is <sup>m</sup>−<sup>1</sup> <sup>i</sup>=0 <sup>Δ</sup>(zi, α<sup>i</sup>+1)(z<sup>i</sup>+1). For the execution <sup>ρ</sup>, we write last(ρ) = <sup>z</sup><sup>m</sup> and say that the length of <sup>ρ</sup>, denoted <sup>|</sup>ρ|, is <sup>m</sup>. The set of all executions of <sup>M</sup> is denoted as Exec(M).

**Partially Observable Markov Decision Processes (POMDPs).** A POMDP <sup>M</sup> is a tuple (Z, zs,Act, Δ, <sup>O</sup>, obs) where <sup>M</sup><sup>0</sup> = (Z, zs,Act, Δ) is an MDP, <sup>O</sup> is a countable set of observations and obs : <sup>Z</sup> → O is a labeling of states with observations. M is said to be finite if M<sup>0</sup> is finite. The set of executions of <sup>M</sup><sup>0</sup> is taken to be the set of executions of <sup>M</sup>, i.e., we define Exec(M) as the set Exec(M0). Given an execution <sup>ρ</sup> <sup>=</sup> <sup>z</sup><sup>0</sup> <sup>α</sup><sup>1</sup> −→ <sup>z</sup><sup>1</sup> <sup>α</sup><sup>2</sup> −→ · · · <sup>α</sup><sup>m</sup> −−→ <sup>z</sup><sup>m</sup> of <sup>M</sup>, the trace of

<sup>1</sup> The operations considered are pairing, hashing, encryption and decryption.

<sup>ρ</sup> is tr(ρ) = obs(z0)α<sup>1</sup>obs(z1)α<sup>2</sup> ··· <sup>α</sup>mobs(zm). For a POMDP <sup>M</sup> and a sequence <sup>o</sup> ∈O·(Act· O)∗, the probability of <sup>o</sup> in <sup>M</sup>, written probM(o), is the sum of the measures of executions in Exec(M) with trace <sup>o</sup>. Given two POMDPs <sup>M</sup><sup>0</sup> and <sup>M</sup><sup>1</sup> with the same set of actions Act and the same set of observations <sup>O</sup>, we say that <sup>M</sup><sup>0</sup> and <sup>M</sup><sup>1</sup> are *distinguishable* if there exists <sup>o</sup> ∈O· (Act · O)<sup>∗</sup> such that probM<sup>0</sup> (o) <sup>=</sup> probM<sup>1</sup> (o). If <sup>M</sup><sup>0</sup> and <sup>M</sup><sup>1</sup> cannot be distinguished, they are said to be *indistinguishable*. We write M<sup>0</sup> ≈ M<sup>1</sup> if M<sup>0</sup> and M<sup>1</sup> are indistinguishable. As is the case in [18,53], indistinguishability can also be defined through a notion of an adversary. Our formulation is equivalent, even when the adversary is allowed to toss coins [18].

**Probabilistic Finite Automata (PFAs).** A PFA is like a finite-state deterministic automaton except that the transition function from a state on a given input is described as a probability distribution. Formally, a PFA A is a tuple (Q, Σ, qs, Δ, F) where Q is a finite set of states, Σ is a finite input alphabet, <sup>q</sup><sup>s</sup> <sup>∈</sup> <sup>Q</sup> is the initial state, <sup>Δ</sup> : <sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>→</sup> Dist(Q) is the transition relation and <sup>F</sup> <sup>⊆</sup> <sup>Q</sup> is the set of accepting states. A run <sup>ρ</sup> of A on an input word <sup>u</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> <sup>=</sup> <sup>a</sup>1a<sup>2</sup> ··· <sup>a</sup><sup>m</sup> is a sequence <sup>q</sup>0q<sup>1</sup> ··· <sup>q</sup><sup>m</sup> <sup>∈</sup> <sup>Q</sup><sup>∗</sup> such that <sup>q</sup><sup>0</sup> <sup>=</sup> <sup>q</sup><sup>s</sup> and <sup>q</sup><sup>i</sup> <sup>∈</sup> support(Δ(q<sup>i</sup>−1, ai)) for each 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>m</sup>. For the run <sup>ρ</sup> on word <sup>u</sup>, its measure, denoted prob<sup>A</sup>,u(ρ), is m <sup>i</sup>=1 <sup>Δ</sup>(q<sup>i</sup>−1, ai)(qi). The run <sup>ρ</sup> is called *accepting* if <sup>q</sup><sup>m</sup> <sup>∈</sup> <sup>F</sup>. The probability of accepting a word <sup>u</sup> <sup>∈</sup> <sup>Σ</sup>, written prob<sup>A</sup>(u), is the sum of the measures of the accepting runs on <sup>u</sup>. Two PFAs <sup>A</sup><sup>0</sup> and <sup>A</sup><sup>1</sup> with the same input alphabet <sup>Σ</sup> are said to be equivalent, denoted <sup>A</sup><sup>0</sup> <sup>≡</sup> <sup>A</sup><sup>1</sup>, if prob<sup>A</sup><sup>0</sup> (u) = prob<sup>A</sup><sup>1</sup> (u) for all input words <sup>u</sup> <sup>∈</sup> <sup>Σ</sup>∗.

# **3 POMDP Indistinguishability**

In this section, we study the underlying semantic objects of randomized security protocols, POMDPs. The techniques we develop for analyzing POMDPs provide the foundation for the indistinguishability algorithms we implement in the Span protocol analysis tool. Our first result shows that indistinguishability of finite POMDPs is decidable in polynomial time by a reduction to PFA equivalence, which is known to be decidable in polynomial time [31,57].

**Proposition 1.** *Indistinguishability of finite POMDPs is in* **P**.

*Proof (sketch).* Consider two POMDPs <sup>M</sup><sup>i</sup> = (Zi, z<sup>i</sup> <sup>s</sup>,Act, Δi, <sup>O</sup>, obs<sup>i</sup>) for <sup>i</sup> <sup>∈</sup> {0, <sup>1</sup>} with the same set of actions Act and the set of observations <sup>O</sup>. We shall construct PFAs <sup>A</sup><sup>0</sup> and <sup>A</sup><sup>1</sup> such that <sup>M</sup><sup>0</sup> ≈ M<sup>1</sup> iff <sup>A</sup><sup>0</sup> <sup>≡</sup> <sup>A</sup><sup>1</sup> as follows. For <sup>i</sup> ∈ {0, <sup>1</sup>}, let "bad<sup>i</sup>" be a new state and define the PFA <sup>A</sup><sup>i</sup> = (Qi,Σ,q<sup>i</sup> <sup>s</sup>, Δ <sup>i</sup>, Fi) where <sup>Q</sup><sup>i</sup> <sup>=</sup> <sup>Z</sup><sup>i</sup> ∪ {bad<sup>i</sup>}, <sup>Σ</sup> <sup>=</sup> Act × O, <sup>q</sup><sup>i</sup> <sup>s</sup> = z<sup>i</sup> <sup>s</sup>, F<sup>i</sup> = Z<sup>i</sup> and Δ <sup>i</sup> is defined as follows.

$$\Delta'\_i(q,(\alpha,o))(q') = \begin{cases} \Delta\_i(q,\alpha)(q') & \text{if } q, q' \in Z\_i \text{ and } \mathsf{obs}(q) = o\\ 1 & \text{if } q \in Z\_i, \,\mathsf{obs}(q) \neq o \text{ and } q' = \mathsf{bad}\_i\\ 1 & \text{if } q, q' = \mathsf{bad}\_i\\ 0 & \text{otherwise} \end{cases}$$

Let <sup>u</sup> = (α1, o0)...(αk, ok−<sup>1</sup>) be a non-empty word on <sup>Σ</sup>. For the word <sup>u</sup>, let <sup>o</sup><sup>u</sup> be the trace <sup>o</sup>0α1o1α2···αk−<sup>1</sup>ok−<sup>1</sup>. The proposition follows immediately from the observation that probA<sup>i</sup> (u) = probM<sup>i</sup> (ou). 

An MDP <sup>M</sup> = (Z, zs,Act, Δ) is said to be acyclic if there is a set of absorbing states <sup>Z</sup>abs <sup>⊆</sup> <sup>Z</sup> such that for all <sup>α</sup> <sup>∈</sup> Act and <sup>z</sup> <sup>∈</sup> <sup>Z</sup>abs, <sup>Δ</sup>(z,α)(z) = 1 and for all ρ = z<sup>0</sup> <sup>α</sup><sup>1</sup> −→ · · · <sup>α</sup><sup>m</sup> −−→ <sup>z</sup><sup>m</sup> <sup>∈</sup> Exec(M) if <sup>z</sup><sup>i</sup> <sup>=</sup> <sup>z</sup><sup>j</sup> for <sup>i</sup> <sup>=</sup> <sup>j</sup> then <sup>z</sup><sup>i</sup> <sup>∈</sup> <sup>Z</sup>abs. Intuitively, acyclic MDPs are MDPs that have a set of "final" absorbing states and the only cycles in the underlying graph are self-loops on these states. A POMDP <sup>M</sup> = (Z, zs,Act, Δ, <sup>O</sup>, obs) is acyclic if the MDP <sup>M</sup><sup>0</sup> = (Z, zs,Act, Δ) is acyclic. We have the following result, which can be shown from the **P**-hardness of the PFA equivalence problem [45].

### **Proposition 2.** *Indistinguishability of finite acyclic POMDPs is* **P***-hard. Hence Indistinguishability of finite POMDPs is P-complete.*

Thanks to Proposition 1, we can check indistinguishability for finite POMDPs by reducing it to PFA equivalence. We now present a new algorithm for indistinguishability of finite acyclic POMDPs. A well-known POMDP analysis technique is to translate a POMDP M into a fully observable belief MDP B(M) that emulates it. One can then analyze B(M) to infer properties of M. The states of B(M) are probability distributions over the states of M. Further, given a state <sup>b</sup> ∈ B(M), if states <sup>z</sup>1, z<sup>2</sup> of <sup>M</sup> are such that <sup>b</sup>(z1), b(z2) are non-zero then z<sup>1</sup> and z<sup>2</sup> must have the same observation. Hence, by abuse of notation, we can define obs(b) to be obs(z) if <sup>b</sup>(z) = 0. Intuitively, an execution ρ = b<sup>0</sup> <sup>α</sup><sup>1</sup> −→ <sup>b</sup><sup>1</sup> <sup>α</sup><sup>2</sup> −→ · · · <sup>α</sup><sup>m</sup> −−→ <sup>b</sup><sup>m</sup> of <sup>B</sup>(M) corresponds to the set of all executions <sup>ρ</sup> of <sup>M</sup> such that tr(ρ ) = obs(b0)α1obs(b1)α<sup>2</sup> ··· <sup>α</sup><sup>m</sup>obs(bm). The measure of execution <sup>ρ</sup> in <sup>B</sup>(M) is exactly prob<sup>M</sup>(obs(b0)α1obs(b1)α<sup>2</sup> ··· <sup>α</sup><sup>m</sup>obs(bm)).

The initial state of B(M) is the distribution that assigns 1 to the initial state of <sup>M</sup>. Intuitively, on a given state <sup>b</sup> <sup>∈</sup> Dist(M) and an action <sup>α</sup>, there is at most one successor state bα,o for each observation o. The probability of transitioning from b to bα,o is the probability that o is observed given that the distribution on the states of <sup>M</sup> is <sup>b</sup> and action <sup>α</sup> is performed; <sup>b</sup>α,o(z) is the conditional probability that the actual state of the POMDP is z. The formal definition follows.

**Definition 1.** *Let* <sup>M</sup> = (Z, zs,Act, Δ, <sup>O</sup>, obs) *be a POMDP. The belief MDP of* <sup>M</sup>*, denoted* <sup>B</sup>(M)*, is the tuple* (Dist(Z), δ<sup>z</sup><sup>s</sup> ,Act, ΔB) *where* <sup>Δ</sup><sup>B</sup> *is defined as follows. For* <sup>b</sup> <sup>∈</sup> Dist(Z)*, action* <sup>α</sup> <sup>∈</sup> Act *and* <sup>o</sup> ∈ O*, let*

$$p\_{b, \alpha, o} = \sum\_{z \in Z} b(z) \cdot \left( \sum\_{z' \in Z \land \mathsf{obs}(z') = o} \Delta(z, \alpha)(z') \right).$$

<sup>Δ</sup>B(b, α) *is the unique distribution such that for each* <sup>o</sup> ∈ O*, if* <sup>p</sup>b,α,o = 0 *then* <sup>Δ</sup>B(b, α)(bα,o) = <sup>p</sup>b,α,o *where for all* <sup>z</sup> <sup>∈</sup> <sup>Z</sup>*,*

$$b^{\alpha,o}(z') = \begin{cases} \frac{\sum\_{z \in Z} b(z) \cdot \Delta(z, \alpha)(z')}{p\_{b, \alpha, o}} & if \text{ obs}(z') = o\\ 0 & otherwise \end{cases}.$$

Let <sup>M</sup><sup>i</sup> = (Zi, z<sup>i</sup> <sup>s</sup>,Act, Δi, <sup>O</sup>, obsi) for <sup>i</sup> ∈ {0, <sup>1</sup>} be POMDPs with the same set of actions and observations. In [14] the authors show that M<sup>0</sup> and M<sup>1</sup> are indistinguishable if and only if the beliefs δz<sup>0</sup> <sup>s</sup> and <sup>δ</sup>z<sup>1</sup> <sup>s</sup> are *strongly belief bisimilar*. Strong belief bisimilarity coincides with the notion of bisimilarity of labeled MDPs: a pair of states (b0, b1) <sup>∈</sup> Dist(Z0)×Dist(Z1) is said to be strongly belief bisimilar if (i) obs(b0) = obs(b1), (ii) for all <sup>α</sup> <sup>∈</sup> Act, o ∈ O, <sup>p</sup>b0,α,o <sup>=</sup> <sup>p</sup>b1,α,o and (iii) the pair (b α,o <sup>0</sup> , bα,o <sup>1</sup> ) is strongly belief bisimilar if pb0,α,o = pb1,α,o > 0. Observe that, in general, belief MDPs are defined over an infinite state space. It is easy to see that, for a finite acyclic POMDP M, B(M) is acyclic and has a finite number of reachable belief states. Let M<sup>0</sup> and M<sup>1</sup> be as above and assume further that <sup>M</sup>0,M<sup>1</sup> are finite and acyclic with absorbing states <sup>Z</sup>abs <sup>⊆</sup> <sup>Z</sup>0∪Z1. As a consequence of the result from [14] and the observations above, we can determine if two states (b0, b1) <sup>∈</sup> Dist(Z0)×Dist(Z1) are strongly belief bisimilar using the on-the-fly procedure from Algorithm 1.

**Algorithm 1.** On-the-fly bisimulation for finite acyclic POMDPs

1: **function** Bisimilar(beliefState <sup>b</sup>0, beliefState <sup>b</sup>1) 2: **if** obs(b0) -<sup>=</sup> obs(b1) **then return** false 3: **if** support(b0) <sup>∪</sup> support(b1) <sup>⊆</sup> <sup>Z</sup>abs **then return** true 4: **for** <sup>α</sup> <sup>∈</sup> Act **do** 5: **for** <sup>o</sup> ∈ O **do** 6: **if** <sup>p</sup><sup>b</sup>0,α,o -<sup>=</sup> <sup>p</sup><sup>b</sup>1,α,o **then return** false 7: **if** <sup>p</sup><sup>b</sup>0,α,o <sup>&</sup>gt; 0 and !Bisimilar(<sup>b</sup> α,o <sup>0</sup> , bα,o <sup>1</sup> ) **then return** false 8: **return** true

# **4 Randomized Security Protocols**

We now present our core process calculus for modeling security protocols with coin tosses. The calculus closely resembles the ones from [10,53]. First proposed in [39], it extends the applied π-calculus [5] by the inclusion of a new operator for probabilistic choice. As in the applied π-calculus, the calculus assumes that messages are terms in a first-order signature identified up-to an equational theory.

### **4.1 Terms, Equational Theories and Frames**

A signature F contains a *finite* set of function symbols, each with an associated arity. We assume F contains two special disjoint sets, Npub and Npriv, of 0-ary symbols.<sup>2</sup> The elements of <sup>N</sup>pub are called *public names* and represent public nonces that can be used by the Dolev-Yao adversary. The elements of Npriv are

<sup>2</sup> As we assume <sup>F</sup> is finite, we allow only a fixed number of public nonces are available to the adversary.

called *names* and represent secret nonces and secret keys. We also assume a set of variables that are partitioned into two disjoint sets X and Xw. The variables in X are called *protocol variables* and are used as placeholders for messages input by protocol participants. The variables in X<sup>w</sup> are called *frame variables* and are used to point to messages received by the Dolev-Yao adversary. Terms are built by the application of function symbols to variables and terms in the standard way. Given a signature <sup>F</sup> and Y ⊆X ∪Xw, we use <sup>T</sup> (F,Y) to denote the set of terms built over <sup>F</sup> and <sup>Y</sup>. The set of variables occurring in a term <sup>u</sup> is denoted by vars(u). A ground term is a term that contains no free variables.

A substitution σ is a partial function with a finite domain that maps variables to terms. dom(σ) will denote the domain and ran(σ) will denote the range. For a substitution <sup>σ</sup> with dom(σ) = {x1,...,x<sup>k</sup>}, we denote <sup>σ</sup> as {x<sup>1</sup> <sup>→</sup> <sup>σ</sup>(x1),...,x<sup>k</sup> <sup>→</sup> <sup>σ</sup>(xk)}. A substitution <sup>σ</sup> is said to be ground if every term in ran(σ) is ground and a substitution with an empty domain will be denoted as ∅. Substitutions can be applied to terms in the usual way and we write uσ for the term obtained by applying the substitution σ to the term u.

Our process algebra is parameterized by an equational theory (F, E), where <sup>E</sup> is a set of <sup>F</sup>-Equations. By an <sup>F</sup>-Equation, we mean a pair <sup>u</sup> <sup>=</sup> <sup>v</sup> where u, v ∈ T (F\Npriv, <sup>X</sup> ) are terms that do not contain private names. We will assume that the equations of (F, E) can be oriented to produce a convergent rewrite system. Two terms u and v are said to be equal with respect to an equational theory (F, E), denoted <sup>u</sup> <sup>=</sup><sup>E</sup> <sup>v</sup>, if <sup>E</sup> <sup>u</sup> <sup>=</sup> <sup>v</sup> in the first order theory of equality. We often identify an equational theory (F, E) by <sup>E</sup> when the signature is clear from the context.

In the calculus, all communication is mediated through an adversary: all outputs first go to an adversary and all inputs are provided by the adversary. Hence, processes are executed in an environment that consists of a frame <sup>ϕ</sup> : <sup>X</sup><sup>w</sup> → T (F, <sup>∅</sup>) and a ground substitution <sup>σ</sup> : X→T (F, <sup>∅</sup>). Intuitively, ϕ represents the sequence of messages an adversary has received from protocol participants and σ records the binding of the protocol variables to actual input messages. An adversary is limited to sending only those messages that it can deduce from the messages it has received thus far. Formally, a term <sup>u</sup> ∈ T (F, <sup>∅</sup>) is *deducible* from a frame <sup>ϕ</sup> with recipe <sup>r</sup> ∈ T (F\Npriv, dom(ϕ)) in equational theory <sup>E</sup>, denoted <sup>ϕ</sup> <sup>r</sup> <sup>E</sup> u, if rϕ =<sup>E</sup> u. We will often omit r and E and write <sup>ϕ</sup> <sup>u</sup> if they are clear from the context.

We now recall an equivalence on frames, called *static equivalence* [5]. Intuitively, two frames are statically equivalent if the adversary cannot distinguish them by performing tests. The tests consists of checking whether two recipes deduce the same term. Formally, two frames ϕ<sup>1</sup> and ϕ<sup>2</sup> are said to be statically equivalent in equational theory <sup>E</sup>, denoted <sup>ϕ</sup><sup>1</sup> <sup>≡</sup><sup>E</sup> <sup>ϕ</sup>2, if dom(ϕ1) = dom(ϕ2) and for all <sup>r</sup>1, r<sup>2</sup> ∈ T (F\Npriv, <sup>X</sup><sup>w</sup>) we have <sup>r</sup>1ϕ<sup>1</sup> <sup>=</sup><sup>E</sup> <sup>r</sup>2ϕ<sup>1</sup> iff <sup>r</sup>1ϕ<sup>2</sup> <sup>=</sup><sup>E</sup> <sup>r</sup>2ϕ2.

### **4.2 Process Syntax**

Processes in our calculus are the parallel composition of roles. Intuitively, a role models a single actor in a single session of the protocol. Syntactically, a role is derived from the grammar:

$$R ::= 0 \mid \mathbf{in}(x)^\ell \mid \mathsf{out}(u\_0 \cdot R +\_p u\_1 \cdot R)^\ell \mid \mathsf{ite}([c\_1 \wedge \ldots \wedge c\_k], R, R)^\ell \mid (R \cdot R)^\ell$$

where p is a rational number in the unit interval [0, 1], <sup>∈</sup> <sup>N</sup>, <sup>x</sup> ∈ X , <sup>u</sup>0, u<sup>1</sup> <sup>∈</sup> <sup>T</sup> (F, <sup>X</sup> ) and <sup>c</sup><sup>i</sup> is <sup>u</sup><sup>i</sup> <sup>=</sup> <sup>v</sup><sup>i</sup> with <sup>u</sup>i, v<sup>i</sup> ∈ T (F, <sup>X</sup> ) for all <sup>i</sup> ∈ {1,...,k}. The constructs in(x), out(u<sup>0</sup> · <sup>R</sup> <sup>+</sup><sup>p</sup> <sup>u</sup><sup>1</sup> · <sup>R</sup>) and ite([c<sup>1</sup> <sup>∧</sup> ... <sup>∧</sup> <sup>c</sup>k], R, R) are said to be labeled operations and <sup>∈</sup> <sup>N</sup> is said to be their label. The role 0 does nothing. The role in(x) reads a term u from the public channel and binds it to <sup>x</sup>. The role out(u<sup>0</sup> · <sup>R</sup> <sup>+</sup><sup>p</sup> <sup>u</sup><sup>1</sup> · <sup>R</sup> ) outputs the term u<sup>0</sup> on the public channel and becomes R with probability p and it outputs the term u<sup>1</sup> and becomes R with probability 1 <sup>−</sup> <sup>p</sup>. A test [c<sup>1</sup> <sup>∧</sup> ... <sup>∧</sup> <sup>c</sup>k] is said to pass if for all 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>, the equality <sup>c</sup><sup>i</sup> holds. The conditional role ite([c<sup>1</sup> <sup>∧</sup>...∧ck], R, R ) becomes R if [c1∧...∧ck] passes and otherwise it becomes <sup>R</sup> . The role <sup>R</sup>·R is the sequential composition of role R followed by role R . The set of variables of a role R is the set of variables occurring in <sup>R</sup>. The construct in(x) · <sup>R</sup> binds variable <sup>x</sup> in <sup>R</sup>. The set of free and bound variables in a role can be defined in the standard way. We will assume that the set of free variables and bound variables of a role are disjoint and that a bound variable is bound only once in a role. A role R is said to be *well-formed* if every labeled operation occurring in R has the same label ; the label is said to be the label of the well-formed role R.

A process is the parallel composition of a finite set of roles R1,...,Rn, denoted <sup>R</sup><sup>1</sup> <sup>|</sup> ... <sup>|</sup> <sup>R</sup>n. We will use <sup>P</sup> and <sup>Q</sup> to denote processes. A process <sup>R</sup><sup>1</sup> <sup>|</sup> ... <sup>|</sup> <sup>R</sup><sup>n</sup> is said to be well-formed if each role is well-formed, the sets of variables of <sup>R</sup><sup>i</sup> and <sup>R</sup><sup>j</sup> are disjoint for <sup>i</sup> <sup>=</sup> <sup>j</sup>, and the labels of roles <sup>R</sup><sup>i</sup> and <sup>R</sup><sup>j</sup> are different for <sup>i</sup> <sup>=</sup> <sup>j</sup>. For the remainder of this paper, processes are assumed to be well-formed. The set of free (resp. bound) variables of P is the union of the sets of free (resp. bound) variables of its roles. P is said to be ground if the set of its free variables is empty. We shall omit labels when they are not relevant in a particular context.

*Example 2.* We model the electronic voting protocol from Example 1 in our formalism. The protocol is built over the equational theory with signature <sup>F</sup> <sup>=</sup> {sk/1, pk/1, aenc/3, adec/2, pair/2, fst/1,snd/1} and the equations

$$\begin{aligned} E &= \{ \mathsf{adec}(\mathsf{aenc}(m, r, \mathsf{pk}(k)), \mathsf{sk}(k)) = m, \\ &\mathsf{fst}(\mathsf{pair}(m\_1, m\_2)) = m\_1, \; \mathsf{nsd}(\mathsf{pair}(m\_1, m\_2)) = m\_2 \}. \end{aligned}$$

The function sk (resp. pk) is used to generate a secret (resp. public) key from a nonce. For generation of their pubic key pairs, Alice, Bob and the election authority hold private names kA, k<sup>B</sup> and kEA, respectively. The candidates will be modeled using public names c<sup>0</sup> and c<sup>1</sup> and the tokens will be modeled using private names <sup>t</sup><sup>A</sup> and <sup>t</sup>B. Additionally, we will write <sup>y</sup><sup>i</sup> and <sup>r</sup><sup>i</sup> for <sup>i</sup> <sup>∈</sup> <sup>N</sup> to denote fresh input variables and private names, respectively. The roles of Alice, Bob and the election authority are as follows.

$$\begin{split} A(c\_A) &:= \mathsf{in}(y\_0) \cdot \mathsf{out}(\mathsf{aenc}(\mathsf{pair}(\mathsf{ade}(y\_0, \mathsf{sk}(k\_A)), c\_A), r\_0, \mathsf{pk}(k\_{EA}))) \\ B(c\_B) &:= \mathsf{in}(y\_1) \cdot \mathsf{out}(\mathsf{aenc}(\mathsf{pair}(\mathsf{ade}(y\_1, \mathsf{sk}(k\_B)), c\_B), r\_1, \mathsf{pk}(k\_{EA}))) \\ EA &:= \mathsf{out}(\mathsf{aenc}(t\_A, r\_2, \mathsf{pk}(k\_A))) \cdot \mathsf{out}(\mathsf{aenc}(t\_B, r\_3, \mathsf{pk}(k\_B))) \cdot \mathsf{in}(y\_3) \cdot \mathsf{in}(y\_4) \\ &\qquad \mathsf{ite}([\mathsf{fst}(\mathsf{ade}(y\_3, \mathsf{sk}(k\_{EA}))) = t\_A \wedge \mathsf{fst}(\mathsf{ade}(y\_4, \mathsf{sk}(k\_{EA}))) = t\_B], \\ \mathsf{out}(\mathsf{pair}(\mathsf{snd}(\mathsf{ade}(y\_3, \mathsf{sk}(k\_{EA}))), \mathsf{snd}(\mathsf{ade}(y\_4, \mathsf{sk}(k\_{EA})))) + \frac{1}{2} \\ &\qquad \mathsf{pair}(\mathsf{snd}(\mathsf{ade}(y\_4, \mathsf{sk}(k\_{EA}))), \mathsf{snd}(\mathsf{ade}(y\_3, \mathsf{sk}(k\_{EA}))))), 0) \end{split}$$

In roles above, we write out(u0) as shorthand for out(u<sup>0</sup> · 0 +<sup>1</sup> <sup>u</sup><sup>0</sup> · 0). The entire protocol is evote(cA, cB) = <sup>A</sup>(cA) <sup>|</sup> <sup>B</sup>(cB) <sup>|</sup> EA.

### **4.3 Process Semantics**

An extended process is a 3-tuple (P, ϕ, σ) where P is a process, ϕ is a frame and σ is a ground substitution whose domain contains the free variables of P. We will write E to denote the set of all extended processes. Semantically, a ground process <sup>P</sup> with <sup>n</sup> roles is a POMDP [[P]] = (Z, zs,Act, Δ, <sup>O</sup>, obs), where <sup>Z</sup> <sup>=</sup> E∪{error}, <sup>z</sup><sup>s</sup> is (P, <sup>∅</sup>, <sup>∅</sup>), Act = (<sup>T</sup> (F \Npriv, <sup>X</sup><sup>w</sup>) ∪ {τ, }×{1,...,n}), Δ is a function that maps an extended process and an action to a distribution on E, O is the set of equivalence classes on frames over the static equivalence relation <sup>≡</sup><sup>E</sup> and obs is as follows. Let [ϕ] denote the equivalence class of <sup>ϕ</sup> with respect to <sup>≡</sup><sup>E</sup>. Define obs to be the function such that for any extended process <sup>η</sup> = (P, ϕ, σ), obs(η)=[ϕ]. We now give some additional notation needed for the definition of <sup>Δ</sup>. Given a measure <sup>μ</sup> on <sup>E</sup> and role <sup>R</sup> we define <sup>μ</sup> · <sup>R</sup> to be the distribution <sup>μ</sup><sup>1</sup> on <sup>E</sup> such that <sup>μ</sup>1(P , ϕ, σ) = μ(P, ϕ, σ) if μ(P, ϕ, σ) > 0 and <sup>P</sup> is <sup>P</sup> · <sup>R</sup> and 0 otherwise. Given a measure <sup>μ</sup> on <sup>E</sup> and a process <sup>Q</sup>, we define <sup>μ</sup> <sup>|</sup> <sup>Q</sup> to be the distribution <sup>μ</sup><sup>1</sup> on <sup>E</sup> such that <sup>μ</sup>1(P , ϕ, σ) = μ(P, ϕ, σ) if <sup>μ</sup>(P, ϕ, σ) <sup>&</sup>gt; 0 and <sup>P</sup> is <sup>P</sup> <sup>|</sup> <sup>Q</sup> and 0 otherwise. The distribution <sup>Q</sup> <sup>|</sup> <sup>μ</sup> is defined analogously. For distributions <sup>μ</sup>1, μ<sup>2</sup> over <sup>E</sup> and a rational number <sup>p</sup> <sup>∈</sup> [0, 1], the convex combination <sup>μ</sup><sup>1</sup> <sup>+</sup><sup>E</sup> <sup>p</sup> <sup>μ</sup><sup>2</sup> is the distribution <sup>μ</sup> on <sup>E</sup> such that <sup>μ</sup>(η) = <sup>p</sup> · <sup>μ</sup>1(η) + (1 <sup>−</sup> <sup>p</sup>) · <sup>μ</sup>2(η) for all <sup>η</sup> ∈ E. The definition of <sup>Δ</sup> is given in Fig. 1, where we write (P, ϕ, σ) <sup>α</sup> −→ <sup>μ</sup> if <sup>Δ</sup>((P, ϕ, σ), α) = <sup>μ</sup>. If <sup>Δ</sup>((P, ϕ, σ), α) is undefined in Fig. 1 then Δ((P, ϕ, σ), α) = δerror. Note that Δ is well-defined, as roles are deterministic.

### **4.4 Indistinguishability in Randomized Cryptographic Protocols**

Protocols <sup>P</sup> and <sup>P</sup> are said to indistinguishable if [[P]] <sup>≈</sup> [[P ]]. Many interesting properties of randomized security protocols can be specified using indistinguishability. For example, consider the simple electronic voting protocol from Example 2. We say that the protocol satisfies the vote privacy property if evote(c0, c1) and evote(c1, c0) are indistinguishable.

In the remainder of this section, we study the problem of deciding when two protocols are indistinguishable by a bounded Dolev-Yao adversary. We restrict our attention to indistinguishability of protocols over subterm convergent equational theories [4]. Before presenting our results, we give some relevant definitions. (F, E) is said to be *subterm convergent* if for every equation

<sup>r</sup> ∈ T (F\Npriv, <sup>X</sup>w) <sup>ϕ</sup> <sup>r</sup> u x ∈ dom(σ) (in(x) - , ϕ, σ) (r,-) −−−→ δ(0,ϕ,σ∪{x→u}) in <sup>i</sup> <sup>=</sup> <sup>|</sup>dom(ϕ)<sup>|</sup> + 1 <sup>ϕ</sup><sup>j</sup> <sup>=</sup> <sup>ϕ</sup> ∪ {w(i,-) → <sup>u</sup>jσ} for <sup>j</sup> ∈ {0, <sup>1</sup>} (out(u<sup>0</sup> · <sup>R</sup><sup>0</sup> <sup>+</sup><sup>p</sup> <sup>u</sup><sup>1</sup> · <sup>R</sup>1) - , ϕ, σ) (τ,-) −−−→ <sup>δ</sup>(R0,ϕ0,σ) <sup>+</sup><sup>E</sup> <sup>p</sup> δ(R1,ϕ1,σ) out <sup>∀</sup><sup>i</sup> ∈ {1,...,k}, c<sup>i</sup> is <sup>u</sup><sup>i</sup> <sup>=</sup> <sup>v</sup><sup>i</sup> and <sup>u</sup>i<sup>σ</sup> <sup>=</sup><sup>E</sup> <sup>v</sup>i<sup>σ</sup> (ite([c<sup>1</sup> <sup>∧</sup> ... <sup>∧</sup> <sup>c</sup>k], R, R ) - , ϕ, σ) (τ,-) −−−→ δ(R,ϕ,σ) condIF <sup>∃</sup><sup>i</sup> ∈ {1,...,k}, c<sup>i</sup> is <sup>u</sup><sup>i</sup> <sup>=</sup> <sup>v</sup><sup>i</sup> and <sup>u</sup>i<sup>σ</sup> =<sup>E</sup> <sup>v</sup>i<sup>σ</sup> (ite([c<sup>1</sup> <sup>∧</sup> ... <sup>∧</sup> <sup>c</sup>k], R, R ) - , ϕ, σ) (τ,-) −−−→ δ(R-,ϕ,σ) condELSE <sup>R</sup> =0 (R, ϕ, σ) <sup>α</sup> −→ μ (<sup>R</sup> · <sup>R</sup> , ϕ, σ) <sup>α</sup> −→ <sup>μ</sup> · <sup>R</sup> seq (R, ϕ, σ) <sup>α</sup> −→ μ (0 · R, ϕ, σ) <sup>α</sup> −→ μ null (Q, ϕ, σ) <sup>α</sup> −→ μ (<sup>Q</sup> <sup>|</sup> <sup>Q</sup> , ϕ, σ) <sup>α</sup> −→ <sup>μ</sup> <sup>|</sup> <sup>Q</sup> parl (Q , ϕ, σ) <sup>α</sup> −→ μ (<sup>Q</sup> <sup>|</sup> <sup>Q</sup> , ϕ, σ) <sup>α</sup> −→ Q | μ parr

**Fig. 1.** Process semantics

<sup>u</sup> <sup>=</sup> <sup>v</sup> <sup>∈</sup> <sup>E</sup> oriented as a rewrite rule <sup>u</sup> <sup>→</sup> <sup>v</sup>, either <sup>v</sup> is a proper subterm of <sup>u</sup> or v is a public name. A term u can be represented as a directed acyclic graph (dag), denoted dag(u) [4,51]. Every node in dag(u) is a function symbol, name or a variable. Nodes labeled by names and variables have out-degree 0. A node labeled with a function symbol f has out-degree equal to the arity of f where outgoing edges of the node are labeled from 1 to the arity of f. Every node of dag(u) represents a unique sub-term of <sup>u</sup>. The depth of a term <sup>u</sup>, denoted depth(u), is the length of the longest simple path from the root in dag(u). Given an action <sup>α</sup>, depth(α) = 0 if <sup>α</sup> = (τ, j) and depth(α) = <sup>m</sup> if <sup>α</sup> = (r, j) and depth(r) = <sup>m</sup>.

Let <sup>P</sup> be a protocol such that [[P]] = (Z, zs,Act, Δ, <sup>O</sup>, obs). Define [[P]]<sup>d</sup> to be the POMDP (Z, zs,Act<sup>d</sup>, Δ, <sup>O</sup>, obs) where Act<sup>d</sup> <sup>⊆</sup> Act is such that every <sup>α</sup> <sup>∈</sup> Act has depth(α) <sup>≤</sup> <sup>d</sup>. For a constant <sup>d</sup> <sup>∈</sup> <sup>N</sup>, we define InDist(d) to be the decision problem that, given a subterm convergent theory (F, E) and protocols <sup>P</sup> and <sup>P</sup> over (F, E), determines if [[P]]<sup>d</sup> and [[P ]]<sup>d</sup> are indistinguishable. We assume that the arity of the function symbols in F is given in unary. We have the following.

# **Theorem 1.** *For any constant* <sup>d</sup> <sup>∈</sup> <sup>N</sup>*,* InDist(d) *is in* **PSPACE***.*

We now show InDist(d) is both **NP**-hard and **coNP**-hard by showing a reduction from #SAT<sup>D</sup> to InDist(d). #SAT<sup>D</sup> is the decision problem that, given a 3CNF formula <sup>φ</sup> and a constant <sup>k</sup> <sup>∈</sup> <sup>N</sup>, checks if the number of satisfying assignments of φ is equal to k.

**Theorem 2.** *There is a* <sup>d</sup><sup>0</sup> <sup>∈</sup> <sup>N</sup> *such that* #SAT<sup>D</sup> *reduces to* InDist(d) *in logspace for every* d>d0*. Thus,* InDist(d) *is NP-hard and coNP-hard for every* d>d0*.*

# **5 Implementation and Evaluation**

Using (the proof of) Proposition 1, we can solve the indistinguishability problem for randomized security protocols as follows. For protocols P, P , translate [[P]], [[P ]] into PFAs A,A and determine if A <sup>≡</sup> A using the linear programming algorithm from [31]. We will henceforth refer to this approach as the PFA algorithm and the approach from Algorithm 1 as the OTF algorithm. We have implemented both the PFA and OTF algorithms as part of Stochastic Protocol ANalayzer (Span), which is a Java based tool for analyzing randomized security protocols. The tool is available for download at [1]. The main engine of Span translates a protocol into a POMDP, belief MDP or PFA by exploring all protocol executions and grouping equivalent states using an engine, Kiss [4] or Akiss [16], for static equivalence. Kiss is guaranteed to terminate for subterm convergent theories and Akiss provides support for XOR while considering a slighly larger class of equational theories called *optimally reducing*. Operations from rewriting logic are provided by queries to Maude [24] and support for arbitrary precision numbers is given by Apfloat [2]. Our experiments were conducted on an Intel core i7 dual quad core processor at 2.67 GHz with 12Gb of RAM. The host operating system was 64 bit Ubuntu 16.04.3 LTS.

Our comparison of the PFA and OTF algorithms began by examining how each approach scaled on a variety of examples (detailed at the end of this section). The results of the analysis are given in Fig. 2. For each example, we consider a fixed recipe depth and report the running times for 2 parties as well as the maximum number of parties for which one of the algorithms terminates within the timeout bound of 60 min. On small examples for which the protocols were indistinguishable, we found that the OTF and PFA algorithms were roughly equivalent. On large examples where the protocols were indistinguishable, such as the 3 ballot protocol, the PFA algorithm did not scale as well as the OTF algorithm. In particular, an out-of-memory exception often occurred during construction of the automata or the linear programming constraints. On examples for which the protocols were distinguishable, the OTF algorithm demonstrated a significant advantage. This was a result of the fact that the OTF approach analyzed the model as it was constructed. If at any point during model construction the bisimulation relation was determined not to hold, model construction was halted. By contrast, the PFA algorithm required the entire model to be constructed and stored before any analysis could take place.

In addition to stress-testing the tool, we also examined how each algorithm performed under various parameters of the mix-network example. The results are given in Fig. 3, where we examine how running times are affected by scaling the number of protocol participants and the recipe depth. Our results coincided with the observations from above. One interesting observation is that the number of beliefs explored on the 5 party example was identical for recipe depth 4 and recipe depth 10. The reason is that, for a given protocol input step, Span generates a


**Fig. 2.** Experimental Results: Columns 1 and 2 describe the example being analyzed. Column 3 gives the maximum recipe depth and column 4 indicates when the example protocols were indistinguishable. Columns 5–8 give the running time (in seconds) for the respective algorithms and static equivalence engines. We report OOM for an out of memory exception and TO for a timeout - which occurs if no solution is generated in 60 min. Column 9 gives the number of states in the protocol's POMDP and Column 10 gives the number of belief states explored in the OTF algorithm. When information could not be determined due to a failure of the tool to terminate, we report n/a. For protocols using equational theories that were not subterm convergent, we write n/s (not supported) for the Kiss engine.

minimal set of recipes. This is in the sense that if recipes r0, r<sup>1</sup> are generated at an input step with frame <sup>ϕ</sup>, then <sup>r</sup>0<sup>ϕ</sup> =<sup>E</sup> <sup>r</sup>1ϕ. For the given number of public names available to the protocol, changing the recipe depth from 4 to 10 did not alter the number of unique terms that could be constructed by the attacker. We conclude this section by describing our benchmark examples, which are available at [3]. Evote is the simple electronic voting protocol derived from Example 2 and the DC-net, mix-net and 3 ballot protocols are described below.

*Dinning Cryptographers Networks.* In a simple DC-net protocol [38], two parties Alice and Bob want to anonymously publish two confidential bits m<sup>A</sup> and mB, respectively. To achieve this, Alice and Bob agree on three private random bits b0, b<sup>1</sup> and b<sup>2</sup> and output a pair of messages according to the following scheme. In our modeling the protocol, the private bits are generated by a trusted third party who communicates them with Alice and Bob using symmetric encryption.

$$\begin{array}{c} \text{If } b\_0 = 0 \\ \qquad \qquad \qquad \text{Alice: } M\_{A,0} = b\_1 \oplus m\_A, \; M\_{A,1} = b\_2 \\ \qquad \qquad \qquad \text{Bob: } M\_{B,0} = b\_1, \; M\_{B,1} = b\_2 \oplus m\_B \\ \text{If } b\_0 = 1 \\ \qquad \qquad \qquad \text{Alice: } M\_{A,0} = b\_1, \; M\_{A,1} = b\_2 \oplus m\_A \\ \qquad \qquad \qquad \text{Bob: } M\_{B,0} = b\_1 \oplus m\_B, \; M\_{B,1} = b\_2 \end{array}$$

From the protocol output, the messages m<sup>A</sup> and m<sup>B</sup> can be retrieved as <sup>M</sup>A,0⊕MB,<sup>0</sup> and <sup>M</sup>A,1⊕MB,1. The party to which the messages belong, however, remains unconditionally private, provided the exchanged secrets are not revealed.


**Fig. 3.** Detailed Experimental Results for Mix Networks: The columns have an identical meaning to the ones from Fig. 2. We report OOM for an out of memory exception and when information could not be determined due to a failure of the tool to terminate, we report n/a.

*Mix Networks.* A mix-network [21], is a routing protocol used to break the link between a message's sender and the message. This is achieved by routing messages through a series of proxy servers, called mixes. Each mix collects a batch of encrypted messages, privately decrypts each message and forwards the resulting messages in random order. More formally, consider a sender Alice (A) who wishes to send a message m to Bob (B) through Mix (M). Alice prepares a cipher-text of the form aenc(aenc(m, n1, pk(B)), n0, pk(M)) where aenc is asymmetric encryption, <sup>n</sup>0, n<sup>1</sup> are nonces and pk(M), pk(B) are the public keys of the Mix and Bob, respectively. Upon receiving a batch of N such cipher-texts, the Mix unwraps the outer layer of encryption on each message using its secret key, randomly permutes and forwards the messages. A passive attacker, who observes all the traffic but does not otherwise modify the network, cannot (with high probability) correlate messages entering and exiting the Mix. Unfortunately, this simple design, known as a *threshold mix*, is vulnerable to a very simple active attack. To expose Alice as the sender of the message aenc(m, n1, pk(B)), an attacker simply forwards Alice's message along with <sup>N</sup>−1 dummy messages to the Mix. In this way, the attacker can distinguish which of the Mix's N output messages is not a dummy message and hence must have originated from Alice.

*3-Ballot Electronic Voting.* We have modeled and analyzed the 3-ballot voting system from [54]. To simplify the presentation of this model, we first describe the major concepts behind 3-ballot voting schemes, as originally introduced by [50]. At the polling station, each voter is given 3 ballots at random. A ballot is comprised of a list of candidates and a ballot ID. When casting a vote, a voter begins by placing exactly one mark next to each candidate on one of the three ballots chosen a random. An additional mark is then placed next to the desired candidate on one of the ballots, again chosen at random. At the completion of the procedure, at least one mark should have been placed on each ballot and two ballots should have marks corresponding to the desired candidate. Once all of the votes have been cast, ballots are collected and released to a public bulletin board. Each voter retains a copy of one of the three ballots as a receipt, which can be used to verify his/her vote was counted.

In the full protocol, a registration agent is responsible for authenticating voters and receiving ballots and ballot ids generated by a vote manager. Once a voter marks his/her set of three ballots, they are returned to the vote manager who forwards them to one of three vote repositories. The vote repositories store the ballots they receive in a random position. After all votes have been collected in the repositories, they are released to a bulletin board by a vote collector. Communication between the registration agent, vote manager, vote repositories and vote collector is encrypted using asymmetric encryption and authenticated using digital signatures. In our modeling, we assume all parties behave honestly.

# **6 Conclusion**

In this paper, we have considered the problem of model checking indistinguishability in randomized security protocols that are executed with respect to a Dolev-Yao adversary. We have presented two different algorithms for the indistinguishability problem assuming bounded recipe sizes. The algorithms have been implemented in the Span protocol analysis tool, which has been used to verify some well known randomized security protocols. We propose the following as part of future work: (i) extension of the current algorithms as well the tool to the case of unbounded recipe sizes; (ii) application of the tool for checking other randomized protocols; (iii) giving tight upper and lower bounds for the indistinguishability problem for the randomized protocols.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Lazy Self-composition for Security Verification**

Weikun Yang<sup>1</sup>, Yakir Vizel1,3(B) , Pramod Subramanyan<sup>2</sup>, Aarti Gupta<sup>1</sup>, and Sharad Malik<sup>1</sup>

> Princeton University, Princeton, USA University of California, Berkeley, Berkeley, USA The Technion, Haifa, Israel yvizel@cs.technion.ac.il

**Abstract.** The secure information flow problem, which checks whether low-security outputs of a program are influenced by high-security inputs, has many applications in verifying security properties in programs. In this paper we present *lazy* self-composition, an approach for verifying secure information flow. It is based on self-composition, where two copies of a program are created on which a safety property is checked. However, rather than an eager duplication of the given program, it uses duplication lazily to reduce the cost of verification. This lazy self-composition is guided by an interplay between symbolic taint analysis on an abstract (single copy) model and safety verification on a refined (two copy) model. We propose two verification methods based on lazy self-composition. The first is a CEGAR-style procedure, where the abstract model associated with taint analysis is refined, on demand, by using a model generated by lazy self-composition. The second is a method based on bounded model checking, where taint queries are generated dynamically during program unrolling to guide lazy self-composition and to conclude an adequate bound for correctness. We have implemented these methods on top of the SeaHorn verification platform and our evaluations show the effectiveness of lazy self-composition.

# **1 Introduction**

Many security properties can be cast as the problem of verifying secure information flow. A standard approach to verifying secure information flow is to reduce it to a safety verification problem on a "self-composition" of the program, i.e., two "copies" of the program are created [5] and analyzed. For example, to check for information leaks or non-interference [17], low-security (public) inputs are initialized to identical values in the two copies of the program, while high-security (confidential) inputs are unconstrained and can take different values. The safety check ensures that in all executions of the two-copy program, the values of the low-security (public) outputs are identical, i.e., there is no information leak from confidential inputs to public outputs. The self-composition approach is useful for

This work was supported in part by NSF Grant 1525936.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 136–156, 2018. https://doi.org/10.1007/978-3-319-96142-2\_11

checking general hyper-properties [11], and has been used in other applications, such as verifying constant-time code for security [1] and k-safety properties of functions like injectivity and monotonicity [32].

Although the self-composition reduction is sound and complete, it is challenging in practice to check safety properties on two copies of a program. There have been many efforts to reduce the cost of verification on self-composed programs, e.g., by use of type-based analysis [33], constructing product programs with aligned fragments [4], lockstep execution of loops [32], transforming Horn clause rules [14,24], etc. The underlying theme in these efforts is to make it easier to derive *relational* invariants between the two copies, e.g., by keeping corresponding variables in the two copies near each other.

In this paper, we aim to improve the self-composition approach by making it *lazier* in contrast to eager duplication into two copies of a program. Specifically, we use symbolic taint analysis to track flow of information from high-security inputs to other program variables. (This is similar to dynamic taint analysis [30], but covers all possible inputs due to static verification.) This analysis works on an abstract model of a single copy of the program and employs standard model checking techniques for achieving precision and path sensitivity. When this abstraction shows a counterexample, we refine it using on-demand duplication of relevant parts of the program. Thus, our *lazy self-composition*<sup>1</sup> approach is guided by an interplay between symbolic taint analysis on an abstract (single copy) model and safety verification on a refined (two copy) model.

We describe two distinct verification methods based on lazy self-composition. The first is an iterative procedure for unbounded verification based on counterexample guided abstraction refinement (CEGAR) [9]. Here, the taint analysis provides a sound over-approximation for secure information flow, i.e., if a lowsecurity output is proved to be untainted, then it is guaranteed to not leak any information. However, even a path-sensitive taint analysis can sometimes lead to "false alarms", i.e., a low-security output is tainted, but its value is unaffected by high-security inputs. For example, this can occur when a branch depends on a tainted variable, but the same (semantic, and not necessarily syntactic) value is assigned to a low-security output on both branches. Such false alarms for security due to taint analysis are then refined by lazily duplicating relevant parts of a program, and performing a safety check on the composed two-copy program. Furthermore, we use relational invariants derived on the latter to strengthen the abstraction within the iterative procedure.

Our second method also takes a similar abstraction-refinement view, but in the framework of bounded model checking (BMC) [6]. Here, we dynamically generate taint queries (in the abstract single copy model) during program unrolling, and use their result to simplify the duplication for self-composition (in the two copy model). Specifically, the second copy duplicates the statements (update logic) only if the taint query shows that the updated variable is possibly tainted. Furthermore, we propose a specialized early termination check for the BMC-

<sup>1</sup> This name is inspired by the *lazy abstraction* approach [20] for software model checking.

based method. In many secure programs, sensitive information is propagated in a localized context, but conditions exist that squash its propagation any further. We formulate the early termination check as a taint check on all live variables at the end of a loop body, i.e., if no live variable is tainted, then we can conclude that the program is secure without further loop unrolling. (This is under the standard assumption that inputs are tainted in the initial state. The early termination check can be suitably modified if tainted inputs are allowed to occur later.) Since our taint analysis is precise and path-sensitive, this approach can be beneficial in practice by unrolling the loops past the point where all taint has been squashed.

We have implemented these methods in the SeaHorn verification platform [18], which represents programs as CHC (Constrained Horn Clause) rules. Our prototype for taint analysis is flexible, with a fully symbolic encoding of the taint policy (i.e., rules for taint generation, propagation, and removal). It fully leverages SMT-based model checking techniques for precise taint analysis. Our prototypes allow rich security specifications in terms of annotations on low/highsecurity variables and locations in arrays, and predicates that allow information downgrading in specified contexts.

We present an experimental evaluation on benchmark examples. Our results clearly show the benefits of lazy self-composition vs. eager self-composition, where the former is much faster and allows verification to complete in larger examples. Our initial motivation in proposing the two verification methods was that we would find examples where one or the other method is better. We expect that easier proofs are likely to be found by the CEGAR-based method, and easier bugs by the BMC-based method. As it turns out, most of our benchmark examples are easy to handle by both methods so far. We believe that our general approach of lazy self-composition would be beneficial in other verification methods, and both our methods show its effectiveness in practice.

To summarize, this paper makes the following contributions.


```
1 int steps = 0;
 2 for (i = 0; i < N; i ++) { zero [i] = product [i] = 0; }
 3 for (i = 0; i < N*W; i ++) {
 4 int bi = bigint_extract_bit (a, i) ;
 5 if (bi == 1) {
 6 bigint_shiftleft (b, i, shifted_b , & steps ) ;
 7 bigint_add ( product , shifted_b , product , & steps ) ;
 8 } else {
 9 bigint_shiftleft (zero , i, shifted_zero , & steps ) ;
10 bigint_add ( product , shifted_zero , product , & steps ) ;
11 }
12 }
```
**Listing 1.** "BigInt" Multiplication

BMC easily outperform an eager self-composition that uses the same backend verification engines.

# **2 Motivating Example**

Listing 1 shows a snippet from a function that performs multiword multiplication. The code snippet is instrumented to count the number of iterations of the inner loop that are executed in bigint shiftleft and bigint add (not shown for brevity). These iterations are counted in the variable steps. The security requirement is that steps must not depend on the secret values in the array a; array b is assumed to be public.

Static analyses, including those based on security types, will conclude that the variable steps is "high-security." This is because steps is assigned in a conditional branch that depends on the high-security variable bi. However, this code is in fact safe because steps is incremented by the same value in both branches of the conditional statement.

Our lazy self-composition will handle this example by first using a symbolic taint analysis to conclude that the variable steps is tainted. It will then selfcompose only those parts of the program related to computation of steps, and discover that it is set to identical values in both copies, thus proving the program is secure.

Now consider the case when the code in Listing 1 is used to multiply two "bigints" of differing widths, e.g., a 512b integer is multiplied with 2048b integer. If this occurs, the upper 1536 bits of a will all be zeros, and bi will not be a high-security variable for these iterations of the loop. Such a scenario can benefit from early-termination in our BMC-based method: our analysis will determine that no tainted value flows to the low security variable steps after iteration 512 and will immediately terminate the analysis.

### **3 Preliminaries**

We consider First Order Logic modulo a theory <sup>T</sup> and denote it by FOL(<sup>T</sup> ). Given a program P, we define a *safety verification* problem w.r.t. P as a transition system <sup>M</sup> <sup>=</sup> -X,*Init*(X), *Tr* (X, X ), *Bad*(X) where <sup>X</sup> denotes a set of (uninterpreted) constants, representing program variables; *Init*, *Tr* and *Bad* are (quantifier-free) formulas in FOL(<sup>T</sup> ) representing the initial states, transition relation and bad states, respectively. The states of a transition system correspond to structures over a signature <sup>Σ</sup> <sup>=</sup> <sup>Σ</sup><sup>T</sup> <sup>∪</sup> <sup>X</sup>. We write *Tr* (X, X ) to denote that *Tr* is defined over the signature <sup>Σ</sup><sup>T</sup> <sup>∪</sup> <sup>X</sup> <sup>∪</sup> <sup>X</sup> , where X is used to represent the pre-state of a transition, and <sup>X</sup> <sup>=</sup> {a <sup>|</sup><sup>a</sup> <sup>∈</sup> <sup>X</sup>} is used to represent the post-state.

A safety verification problem is to decide whether a transition system M is SAFE or UNSAFE. We say that M is UNSAFE iff there exists a number N such that the following formula is satisfiable:

$$Int(X\_0) \land \left(\bigwedge\_{i=0}^{N-1} Tr(X\_i, X\_{i+1})\right) \land Bad(X\_N) \tag{1}$$

where <sup>X</sup><sup>i</sup> <sup>=</sup> {a<sup>i</sup>|<sup>a</sup> <sup>∈</sup> <sup>X</sup>} is a copy of the program variables (uninterpreted constants) used to represent the state of the system after the execution of i steps.

When <sup>M</sup> is UNSAFE and <sup>s</sup><sup>N</sup> <sup>∈</sup> *Bad* is reachable, the path from <sup>s</sup><sup>0</sup> <sup>∈</sup> *Init* to s<sup>N</sup> is called a *counterexample* (CEX).

A transition system M is SAFE iff the transition system has no counterexample, of any length. Equivalently, M is SAFE iff there exists a formula *Inv*, called a *safe inductive invariant*, that satisfies: (i) *Init*(X) <sup>→</sup> Inv(X), (ii) *Inv*(X) <sup>∧</sup> *Tr* (X, X ) <sup>→</sup> *Inv*(X ), and (iii) *Inv*(X) → ¬*Bad*(X).

In SAT-based model checking (e.g., based on IC3 [7] or interpolants [23, 34]), the verification procedure maintains an *inductive trace* of formulas [F0(X),...,F<sup>N</sup> (X)] that satisfy: (i) *Init*(X) <sup>→</sup> <sup>F</sup>0(X), (ii) <sup>F</sup>i(X)∧*Tr* (X, X ) → F<sup>i</sup>+1(X ) for every 0 <sup>≤</sup> i<N, and (iii) <sup>F</sup>i(X) → ¬*Bad*(X) for every 0 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>N</sup>. A trace [F0,...,F<sup>N</sup> ] is *closed* if <sup>∃</sup><sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>N</sup> · <sup>F</sup><sup>i</sup> <sup>⇒</sup> <sup>i</sup>−<sup>1</sup> <sup>j</sup>=0 <sup>F</sup><sup>j</sup> . There is an obvious relationship between existence of closed traces and safety of a transition system: *A transition system* T *is SAFE iff it admits a safe closed trace.* Thus, safety verification is reduced to searching for a safe closed trace or finding a CEX.

# **4 Information Flow Analysis**

Let P be a program over a set of program variables X. Recall that *Init*(X) is a formula describing the initial states and *Tr* (X, X ) a transition relation. We assume a "stuttering" transition relation, namely, *Tr* is reflexive and therefore it can non-deterministically either move to the next state or stay in the same state. Let us assume that <sup>H</sup> <sup>⊂</sup> <sup>X</sup> is a set of high-security variables and <sup>L</sup> := <sup>X</sup>\<sup>H</sup> is a set of low-security variables.

For each <sup>x</sup> <sup>∈</sup> <sup>L</sup>, let Obsx(X) be a predicate over program variables <sup>X</sup> that determines when variable x is adversary-observable. The precise definition of Obsx(X) depends on the threat model being considered. A simple model would be that for each low variable <sup>x</sup> <sup>∈</sup> <sup>L</sup>, Obsx(X) holds only at program completion – this corresponds to a threat model where the adversary can run a program that operates on some confidential data and observe its public (low-security) outputs after completion. A more sophisticated definition of Obsx(X) could consider, for example, a concurrently executing adversary. Appropriate definitions of Obsx(X) can also model declassification [29], by setting Obsx(X) to be false in program states where the declassification of x is allowed.

The *information flow* problem checks whether there exists an execution of <sup>P</sup> such that the value of variables in <sup>H</sup> affects a variable in <sup>x</sup> <sup>∈</sup> <sup>L</sup> in some state where the predicate Obsx(X) holds. Intuitively, information flow analysis checks if low-security variables "leak" information about high-security variables.

We now describe our formulations of two standard techniques that have been used to perform information flow analysis. The first is based on taint analysis [30], but we use a symbolic (rather than a dynamic) analysis that tracks taint in a path-sensitive manner over the program. The second is based on selfcomposition [5], where two copies of the program are created and a safety property is checked over the composed program.

### **4.1 Symbolic Taint Analysis**

When using taint analysis for checking information flow, we mark high-security variables with a "taint" and check if this taint can propagate to low-security variables. The propagation of taint through program variables of P is determined by both assignments and the control structure of P. In order to perform precise taint analysis, we formulate it as a safety verification problem. For this purpose, for each program variable <sup>x</sup> <sup>∈</sup> <sup>X</sup>, we introduce a new "taint" variable <sup>x</sup>t. Let <sup>X</sup><sup>t</sup> := {x<sup>t</sup>|<sup>x</sup> <sup>∈</sup> <sup>X</sup>} be the set of taint variables where <sup>x</sup><sup>t</sup> <sup>∈</sup> <sup>X</sup><sup>t</sup> is of sort Boolean. Let us define a transition system <sup>M</sup><sup>t</sup> := -Y,*Init* <sup>t</sup>, *Tr* <sup>t</sup>, *Bad*<sup>t</sup> where <sup>Y</sup> := <sup>X</sup> <sup>∪</sup> <sup>X</sup><sup>t</sup> and

$$Int\_t(Y) \coloneqq Int(X) \land \left(\bigwedge\_{x \in H} x\_t\right) \land \left(\bigwedge\_{x \in L} \neg x\_t\right) \tag{2}$$

$$\operatorname{Tr}\_t(Y, Y') := \Pr\_\wedge(X, X') \wedge \hat{\operatorname{Tr}}(Y, X'\_t) \tag{3}$$

$$Bad\_t(Y) := \left(\bigvee\_{x \in L} Obs\_x(X) \land x\_t\right) \tag{4}$$

Since taint analysis tracks information flow from high-security to low-security variables, variables in H<sup>t</sup> are initialized to *true* while variables in L<sup>t</sup> are initialized to *false*. W.l.o.g., let us denote the state update for a program variable <sup>x</sup> <sup>∈</sup> <sup>X</sup> as: x = cond(X) ? ϕ1(X) : ϕ2(X). Let ϕ be a formula over Σ. We capture the taint of ϕ by:

$$\Theta(\varphi) = \begin{cases}false{e} & \text{if } \varphi \cap X = \emptyset \\ \bigvee\_{x \in \varphi} x\_t & \text{otherwise} \end{cases}$$

Thus, *Tr*ˆ (Xt, X <sup>t</sup>) is defined as:  <sup>x</sup>*t*∈X*<sup>t</sup>* x <sup>t</sup> <sup>=</sup> <sup>Θ</sup>(cond)∨(cond ? <sup>Θ</sup>(ϕ1) : <sup>Θ</sup>(ϕ2))

Intuitively, taint may propagate from x<sup>1</sup> to x<sup>2</sup> either when x<sup>1</sup> is assigned an expression that involves x<sup>2</sup> or when an assignment to x<sup>1</sup> is controlled by x2. The bad states (*Bad*t) are all states where a low-security variable is tainted and observable.

### **4.2 Self-composition**

When using self-composition, information flow is tracked over an execution of two copies of the program, <sup>P</sup> and <sup>P</sup>d. Let us denote <sup>X</sup><sup>d</sup> := {xd|<sup>x</sup> <sup>∈</sup> <sup>X</sup>} as the set of program variables of Pd. Similarly, let *Init*d(Xd) and *Tr* <sup>d</sup>(Xd, X d) denote the initial states and transition relation of Pd. Note that *Init*<sup>d</sup> and *Tr* <sup>d</sup> are computed from *Init* and *Tr* by means of substitutions. Namely, substituting every occurrence of <sup>x</sup> <sup>∈</sup> <sup>X</sup> or <sup>x</sup> <sup>∈</sup> <sup>X</sup> with <sup>x</sup><sup>d</sup> <sup>∈</sup> <sup>X</sup><sup>d</sup> and <sup>x</sup> <sup>d</sup> <sup>∈</sup> <sup>X</sup> <sup>d</sup>, respectively. Similarly to taint analysis, we formulate information flow over a self-composed program as a safety verification problem: <sup>M</sup><sup>d</sup> := -Z,*Init*d, *Tr* <sup>d</sup>, *Bad*<sup>d</sup> where <sup>Z</sup> := <sup>X</sup> <sup>∪</sup> <sup>X</sup><sup>d</sup> and

$$Int\_d(Z) := Int(X) \land Int(X\_d) \land \left(\bigwedge\_{x \in L} x = x\_d\right) \tag{5}$$

$$Tr\_d(Z, Z') := \Pr\_{\precarrow}(X, X') \land \Pr(X\_d, X'\_d) \tag{6}$$

$$Bad\_d(Z) := \left(\bigvee\_{x \in L} Obs\_x(X) \land Obs\_x(X\_d) \land \neg (x = x\_d)\right) \tag{7}$$

In order to track information flow, variables in L<sup>d</sup> are initialized to be equal to their counterpart in L, while variables in H<sup>d</sup> remain unconstrained. A leak is captured by the bad states (i.e. *Bad*d). More precisely, there exists a leak iff there exists an execution of M<sup>d</sup> that results in a state where Obsx(X), Obsx(Xd) hold and <sup>x</sup> <sup>=</sup> <sup>x</sup><sup>d</sup> for a low-security variable <sup>x</sup> <sup>∈</sup> <sup>L</sup>.

### **5 Lazy Self-composition for Information Flow Analysis**

In this section, we introduce lazy self-composition for information flow analysis. It is based on an interplay between symbolic taint analysis on a single copy and safety verification on a self-composition, which were both described in the previous section.

Recall that taint analysis is imprecise for determining secure information flow in the sense that it may report spurious counterexamples, namely, spurious leaks. In contrast, self-composition is precise, but less efficient. The fact that self composition requires a duplication of the program often hinders its performance. The main motivation for lazy self-composition is to target both efficiency and precision.

Intuitively, the model for symbolic taint analysis M<sup>t</sup> can be viewed as an abstraction of the self-composed model Md, where the Boolean variables in M<sup>t</sup> are predicates tracking the states where <sup>x</sup> <sup>=</sup> <sup>x</sup><sup>d</sup> for some <sup>x</sup> <sup>∈</sup> <sup>X</sup>. This intuition is captured by the following statement: M<sup>t</sup> *over-approximates*Md.

**Corollary 1.** *If there exists a path in* M<sup>d</sup> *from Init*<sup>d</sup> *to Bad*<sup>d</sup> *then there exists a path in* M<sup>t</sup> *from Init* <sup>t</sup> *to Bad*t*.*

**Corollary 2.** *If there exists no path in* M<sup>t</sup> *from Init* <sup>t</sup> *to Bad*<sup>t</sup> *then there exists no path in* M<sup>d</sup> *from Init*<sup>d</sup> *to Bad*d*.*

This abstraction-based view relating symbolic taint analysis and selfcomposition can be exploited in different verification methods for checking secure information flow. In this paper, we focus on two – a CEGAR-based method (Ifc-CEGAR) and a BMC-based method (Ifc-BMC). These methods using lazy self-composition are now described in detail.

#### **5.1 IFC-CEGAR**

We make use of the fact that M<sup>t</sup> can be viewed as an abstraction w.r.t. to Md, and propose an abstraction-refinement paradigm for secure information flow analysis. In this setting, M<sup>t</sup> is used to find a possible counterexample, i.e., a path that leaks information. Then, M<sup>d</sup> is used to check if this counterexample is spurious or real. In case the counterexample is found to be spurious, Ifc-CEGAR uses the proof that shows why the counterexample is not possible in M<sup>d</sup> to refine Mt.

A sketch of Ifc-CEGAR appears in Algorithm 1. Recall that we assume that solving a safety verification problem is done by maintaining an inductive trace. We denote the traces for M<sup>t</sup> and M<sup>d</sup> by *G* = [G0,...,Gk] and *H* = [H0,...,Hk], respectively. Ifc-CEGAR starts by initializing Mt, M<sup>d</sup> and their respective traces *G* and *H* (lines 1–4). The main loop of Ifc-CEGAR (lines 5–18) starts by looking for a counterexample over M<sup>t</sup> (line 6). In case no counterexample is found, Ifc-CEGAR declares there are no leaks and returns SAFE.

If a counterexample π is found in Mt, Ifc-CEGAR first updates the trace of Md, i.e. *H*, by rewriting *G* (line 10). In order to check if π is spurious, Ifc-CEGAR creates a new safety verification problem Mc, a version of M<sup>d</sup> constrained by π (line 11) and solves it (line 12). If M<sup>c</sup> has a counterexample, Ifc-CEGAR returns UNSAFE. Otherwise, *G* is updated by *H* (line 16) and M<sup>t</sup> is refined such that π is ruled out (line 17).

The above gives a high-level overview of how Ifc-CEGAR operates. We now go into more detail. More specifically, we describe the functions ReWrite, Constraint and Refine. We note that these functions can be designed and implemented in several different ways. In what follows we describe some possible choices.

**Proof-Based Abstraction.** Let us assume that when solving M<sup>t</sup> a counterexample π of length k is found and an inductive trace *G* is computed. Following a proof-based abstraction approach, Constraint() uses the length of π to bound the length of possible executions in M<sup>d</sup> by k. Intuitively, this is similar to bounding the length of the computed inductive trace over Md.

In case M<sup>c</sup> has a counterexample, a real leak (of length k) is found. Otherwise, since M<sup>c</sup> considers all possible executions of M<sup>d</sup> of length k, Ifc-CEGAR

### **Algorithm 1.** Ifc-CEGAR (P,H)

```
Input: A program P and a set of high-security variables H
  Output: SAFE, UNSAFE or UNKNOWN.
1 Mt ← ConstructTaintModel(P, H)
2 Md ← ConstructSCModel(P, H)
3 G ← [G0 = Init t]
4 H ← [H0 = Initd]
5 repeat
6 (G, Rtaint, π) ← MC.Solve(Mt, G)
7 if Rtaint = SAFE then
8 return SAFE
9 else
10 H ← ReWrite(G, H)
11 Mc ← Constraint(Md, π)
12 (H, Rs, π) ← MC.Solve(Mc, H)
13 if Rs = UNSAFE then
14 return UNSAFE
15 else
16 G ← ReWrite(H, G)
17 Mt ← Refine(Mt, G)
18 until ∞
19 return UNKNOWN
```
deduces that there are no counterexamples of length k. In particular, the counterexample π is ruled out. Ifc-CEGAR therefore uses this fact to refine M<sup>t</sup> and *G*.

**Inductive Trace Rewriting.** Consider the set of program variables X, taint variables Xt, and self compositions variables Xd. As noted above, M<sup>t</sup> overapproximates Md. Intuitively, it may mark a variable x as tainted when x does not leak information. Equivalently, if a variable x is found to be untainted in M<sup>t</sup> then it is known to also not leak information in Md. More formally, the following relation holds: <sup>¬</sup>x<sup>t</sup> <sup>→</sup> (<sup>x</sup> <sup>=</sup> <sup>x</sup>d).

This gives us a procedure for rewriting a trace over M<sup>t</sup> to a trace over Md. Let *G* = [G0,...,Gk] be an inductive trace over Mt. Considering the definition of <sup>M</sup>t, *<sup>G</sup>* can be decomposed and rewritten as: <sup>G</sup>i(<sup>Y</sup> ) := <sup>G</sup>¯i(X)∧G¯<sup>t</sup> <sup>i</sup> (Xt)∧ψ(X, Xt). Namely, G¯i(X) and G¯<sup>t</sup> <sup>i</sup>(Xt) are sub-formulas of G<sup>i</sup> over only X and X<sup>t</sup> variables, respectively, and ψ(X, Xt) is the part connecting X and Xt.

Since *<sup>G</sup>* is an inductive trace <sup>G</sup>i(<sup>Y</sup> )∧*Tr* <sup>t</sup>(Y,Y ) <sup>→</sup> <sup>G</sup><sup>i</sup>+1(<sup>Y</sup> ) holds. Following the definition of *Tr* <sup>t</sup> and the above decomposition of Gi, the following holds:

$$
\bar{G}\_i(X) \wedge \operatorname{Tr}(X, X') \to \bar{G}\_{i+1}(X')
$$

Let *H* = [H0,...,Hk] be a trace w.r.t. Md. We define the *update* of *H* by *G* as the trace *H*<sup>∗</sup> = [H<sup>∗</sup> <sup>0</sup> ,...,H<sup>∗</sup> <sup>k</sup> ], which is defined as follows:

$$H\_0^\* := Int\_d\tag{8}$$

$$H\_i^\*(Z) := H\_i(Z) \land \bar{G}\_i(X) \land \bar{G}\_i(X\_d) \land \left(\bigwedge\{x = x\_d | G\_i(Y) \to \neg x\_t\}\right) \tag{9}$$

Intuitively, if a variable <sup>x</sup> <sup>∈</sup> <sup>X</sup> is known to be untainted in <sup>M</sup>t, using Corollary 2 we conclude that x = x<sup>d</sup> in Md.

A similar update can be defined when updating a trace *G* w.r.t. M<sup>t</sup> by a trace *<sup>H</sup>* w.r.t. <sup>M</sup>d. In this case, we use the following relation: <sup>¬</sup>(<sup>x</sup> <sup>=</sup> <sup>x</sup>d) <sup>→</sup> <sup>x</sup>t. Let *H* = [H0(Z),...,Hk(Z)] be the inductive trace w.r.t. Md. *H* can be decomposed and written as <sup>H</sup>i(Z) := <sup>H</sup>¯i(X) <sup>∧</sup> <sup>H</sup>¯ <sup>d</sup> <sup>i</sup> (Xd) <sup>∧</sup> <sup>φ</sup>(X, Xd).

Due to the definition of M<sup>d</sup> and an inductive trace, the following holds:

$$
\bar{H}\_i(X) \wedge \text{Tr}(X, X') \to \bar{H}\_i(X')
$$

$$
\bar{H}\_i^d(X\_d) \wedge \text{Tr}(X\_d, X'\_d) \to \bar{H}\_i^d(X'\_d)
$$

We can therefore update a trace *G* = [G0,...,Gk] w.r.t. M<sup>t</sup> by defining the trace *G*<sup>∗</sup> = [G<sup>∗</sup> 0,...,G<sup>∗</sup> <sup>k</sup>], where:

$$G\_0^\* := Int\_d\tag{10}$$

$$G\_i^\*(Y) \coloneqq G\_i(Y) \land \bar{H}\_i(X) \land \bar{H}\_i^d(X) \land \left(\bigwedge\{x\_t|H\_i(Z) \rightarrow \neg(x = x\_d)\}\right) \tag{11}$$

Updating *G* by *H*, and vice-versa, as described above is based on the fact that M<sup>t</sup> over-approximates M<sup>d</sup> w.r.t. tainted variables (namely, Corollaries 1 and 2). It is therefore important to note that *G*<sup>∗</sup> in particular, does not "gain" more precision due to this process.

**Lemma 1.** *Let G be an inductive trace w.r.t.* M<sup>t</sup> *and H an inductive trace w.r.t.* Md*. Then, the updated H*<sup>∗</sup> *and G*<sup>∗</sup> *are inductive traces w.r.t.* M<sup>d</sup> *and* Mt*, respectively.*

**Refinement.** Recall that in the current scenario, a counterexample was found in Mt, and was shown to be spurious in Md. This fact can be used to refine both M<sup>t</sup> and *G*.

As a first step, we observe that if <sup>x</sup> <sup>=</sup> <sup>x</sup><sup>d</sup> in <sup>M</sup>d, then <sup>¬</sup>x<sup>t</sup> should hold in Mt. However, since M<sup>t</sup> is an over-approximation it may allow x to be tainted, namely, allow x<sup>t</sup> to be evaluated to *true*.

In order to refine M<sup>t</sup> and *G*, we define a strengthening procedure for *G*, which resembles the updating procedure that appears in the previous section. Let *H* = [H0,...,Hk] be a trace w.r.t. M<sup>d</sup> and *G* = [G0,...,Gk] be a trace w.r.t. Mt, then the strengthening of *G* is denoted as *G*<sup>r</sup> = [G<sup>r</sup> 0,...,G<sup>r</sup> <sup>k</sup>] such that:

$$G\_0^r := Int\_d \tag{12}$$

$$G\_i^r(Y) := G\_i(Y) \land \bar{H}\_i(X) \land \bar{H}\_i^s(X) \land \left(\bigwedge \{x\_t | H\_i(Z) \to \neg(x = x\_d)\}\right) \land$$

$$\left(\bigwedge \{\neg x\_t | H\_i(Z) \to (x = x\_d)\}\right) \tag{13}$$

The above gives us a procedure for strengthening *G* by using *H*. Note that since <sup>M</sup><sup>t</sup> is an over-approximation of <sup>M</sup>d, it may allow a variable <sup>x</sup> <sup>∈</sup> <sup>X</sup> to be tainted, while in M<sup>d</sup> (and therefore in *H*), x = xd. As a result, after strengthening *G*<sup>r</sup> is not necessarily an inductive trace w.r.t. Mt, namely, G<sup>r</sup> <sup>i</sup> <sup>∧</sup>*Tr* <sup>t</sup> <sup>→</sup> <sup>G</sup><sup>r</sup> <sup>i</sup>+1 does not necessarily hold. In order to make *G*<sup>r</sup> an inductive trace, M<sup>t</sup> must be refined.

Let us assume that G<sup>r</sup> <sup>i</sup> <sup>∧</sup> *Tr* <sup>t</sup> <sup>→</sup> <sup>G</sup><sup>r</sup> <sup>i</sup>+1 does not hold. By that, <sup>G</sup><sup>r</sup> <sup>i</sup> ∧ *Tr* <sup>t</sup> ∧ <sup>¬</sup>G<sup>r</sup> <sup>i</sup>+1 is satisfiable. Considering the way *<sup>G</sup>*<sup>r</sup> is strengthened, three exists <sup>x</sup> <sup>∈</sup> <sup>X</sup> such that G<sup>r</sup> <sup>i</sup> <sup>∧</sup> *Tr* <sup>t</sup> <sup>∧</sup> <sup>x</sup> <sup>t</sup> is satisfiable and G<sup>r</sup> <sup>i</sup>+1 ⇒ ¬xt. The refinement step is defined by:

$$x\_t' = G\_i^r \text{ ? } false \;:\, (\Theta(cond) \lor (cond \; ? \; \Theta(\varphi\_1) \;:\; \Theta(\varphi\_2)))$$

This refinement step changes the next state function of x<sup>t</sup> such that whenever G<sup>i</sup> holds, x<sup>t</sup> is forced to be *false* at the next time frame.

**Lemma 2.** *Let G*<sup>r</sup> *be a strengthened trace, and let* M<sup>r</sup> <sup>t</sup> *be the result of refinement as defined above. Then, G*<sup>r</sup> *is an inductive trace w.r.t* M<sup>r</sup> t *.*

**Theorem 1.** *Let* A *be a sound and complete model checking algorithm w.r.t.* FOL(<sup>T</sup> ) *for some* <sup>T</sup> *, such that* <sup>A</sup> *maintains an inductive trace. Assuming* Ifc-CEGAR *uses* A*, then* Ifc-CEGAR *is both sound and complete.*

*Proof (Sketch).* Soundness follows directly from the soundness of taint analysis. For completeness, assume M<sup>d</sup> is SAFE. Due to our assumption that A is sound and complete, A emits a closed inductive trace *H*. Intuitively, assuming *H* is of size k, then the next state function of every taint variable in M<sup>t</sup> can be refined to be a constant *false* after a specific number of steps. Then, *H* can be translated to a closed inductive trace *G* over M<sup>t</sup> by following the above presented formalism. Using Lemma 2 we can show that a closed inductive trace exists for the refined taint model.

### **5.2 IFC-BMC**

In this section we introduce a different method based on Bounded Model Checking (BMC) [6] that uses lazy self-composition for solving the information flow security problem. This approach is described in Algorithm 2. In addition to the program P, and the specification of high-security variables H, it uses an extra parameter BND that limits the maximum number of loop unrolls performed on the program P. (Alternatively, one can fall back to an unbounded verification method after BND is reached in BMC).

**Algorithm 2.** Ifc-BMC (P,H,BND)

```
Input: A program P, a set of high-security variables H, max unroll bound
         BND
  Output: SAFE, UNSAFE or UNKNOWN.
1 i ← 0
2 repeat
3 M(i) ← LoopUnroll(P, i)
4 Mt(i) ← EncodeTaint(M(i))
5 TR of Ms(i) ← LazySC(M(i), Mt(i))
6 Bad of Ms(i) ← -

                    y∈L
                       ¬(y = y
                               )
7 result ← SolveSMT(Ms(i))
8 if result = counterexample then
9 return UNSAFE
10 live taint ← CheckLiveTaint(Mt(i))
11 if live taint = false then
12 return SAFE
13 i ← i + 1
14 until i = BND
15 return UNKNOWN
```
**Algorithm 3.** LazySC(Mt, M)

**Input**: A program model <sup>M</sup> and the corresponding taint program model <sup>M</sup>*<sup>t</sup>* **Output**: Transition relation of the self-composed program T r*<sup>s</sup>*

**<sup>1</sup> for** *each state update* <sup>x</sup> <sup>←</sup> <sup>ϕ</sup> *in transition relation of* <sup>M</sup> **do**

**<sup>2</sup>** add state update <sup>x</sup> <sup>←</sup> <sup>ϕ</sup> to T r*<sup>s</sup>*

**<sup>3</sup>** tainted <sup>←</sup> SolveSMT(query on <sup>x</sup>*<sup>t</sup>* in <sup>M</sup>*t*)

**<sup>4</sup> if** tainted <sup>=</sup> *false* **then**

```
5 add state update x ← x to T rs
```
**<sup>6</sup> else <sup>7</sup>** add state update <sup>x</sup> <sup>←</sup> duplicate(ϕ) to T r*<sup>s</sup>*

**<sup>8</sup> return** T r*<sup>s</sup>*

In each iteration of the algorithm (line 2), loops in the program P are unrolled (line 3) to produce a loop-free program, encoded as a transition system M(i). A new transition system Mt(i) is created (line 4) following the method described in Sect. 4.1, to capture precise taint propagation in the unrolled program M(i). Then lazy self-composition is applied (line 5), as shown in detail in Algorithm 3, based on the interplay between the taint model Mt(i) and the transition system M(i). In detail, for each variable x updated in M(i), where the state update is denoted x := ϕ, we use x<sup>t</sup> in Mt(i) to encode whether x is possibly tainted. We generate an SMT query to determine if x<sup>t</sup> is satisfiable. If it is unsatisfiable, i.e., x<sup>t</sup> evaluates to F alse, we can conclude that high security variables cannot affect the value of x. In this case, its duplicate variable x in the self-composed program Ms(i) is set equal to x, eliminating the need to duplicate the computation that will produce x . Otherwise if x<sup>t</sup> is satisfiable (or unknown), we duplicate ϕ and update x accordingly.

The self-composed program Ms(i) created by LazySC (Algorithm 3) is then checked by a bounded model checker, where a bad state is a state where any low-security output <sup>y</sup> (<sup>y</sup> <sup>∈</sup> <sup>L</sup>, where <sup>L</sup> denotes the set of low-security variables) has a different value than its duplicate variable y (line 6). (For ease of exposition, a simple definition of bad states is shown here. This can be suitably modified to account for Obsx(X) predicates described in Sect. 4.) A counterexample produced by the solver indicates a leak in the original program P. We also use an early termination check for BMC encoded as an SMT-based query CheckLiveT aint, which essentially checks whether any live variable is tainted (line 10). If none of the live variables is tainted, i.e., any initial taint from high-security inputs has been squashed, then Ifc-BMC can stop unrolling the program any further. If no conclusive result is obtained, Ifc-BMC will return *UNKNOWN* .

# **6 Implementation and Experiments**

We have implemented prototypes of Ifc-CEGAR and Ifc-BMC for information flow checking. Both are implemented on top of SeaHorn [18], a software verification platform that encodes programs as CHC (Constrained Horn Clause) rules. It has a frontend based on LLVM [22] and backends to Z3 [15] and other solvers. Our prototype has a few limitations. First, it does not support bitprecise reasoning and does not support complex data structures such as lists. Our implementation of symbolic taint analysis is flexible in supporting any given taint policy (i.e., rules for taint generation, propagation, and removal). It uses an encoding that fully leverages SMT-based model checking techniques for precise taint analysis. We believe this module can be independently used in other applications for security verification.

# **6.1 Implementation Details**

Ifc-CEGAR *Implementation.* As discussed in Sect. 5.1, the Ifc-CEGAR implementation uses taint analysis and self-composition synergistically and is tailored toward proving that programs are secure. Both taint analysis and selfcomposition are implemented as LLVM-passes that instrument the program. Our prototype implementation executes these two passes interchangeably as the problem is being solved. The Ifc-CEGAR implementation uses Z3's CHC solver engine called Spacer. Spacer, and therefore our Ifc-CEGAR implementation, does not handle the bitvector theory, limiting the set of programs that can be verified using this prototype. Extending the prototype to support this theory will be the subject of future work.

Ifc-BMC *Implementation.* In the Ifc-BMC implementation, the loop unroller, taint analysis, and lazy self-composition are implemented as passes that work on CHC, to generate SMT queries that are passed to the backend Z3 solver. Since the Ifc-BMC implementation uses Z3, and not Spacer, it can handle all the programs in our evaluation, unlike the Ifc-CEGAR implementation.

*Input Format.* The input to our tools is a C-program with annotations indicating which variables are *secret* and the locations at which leaks should be checked. In addition, variables can be marked as *untainted* at specific locations.

### **6.2 Evaluation Benchmarks**

For experiments we used a machine running Intel Core i7-4578U with 8GB of RAM. We tested our prototypes on several micro-benchmarks<sup>2</sup> in addition to benchmarks inspired by real-world programs. For comparison against eager selfcomposition, we used the SeaHorn backend solvers on a 2-copy version of the benchmark. fibonacci is a micro-benchmark that computes the N-th Fibonacci number. There are no secrets in the micro-benchmark, and this is a sanity check taken from [33]. list 4/8/16 are programs working with linked lists, the trailing number indicates the maximum number of nodes being used. Some linked list nodes contain secrets while others have public data, and the verification problem is to ensure that a particular function that operates on the linked list does not leak the secret data. modadd safe is program that performs multi-word addition; modexp safe/unsafe are variants of a program performing modular exponentiation; and pwdcheck safe/unsafe are variants of program that compares an input string with a secret password. The verification problem in these examples is to ensure that an iterator in a loop does not leak secret information, which could allow a timing attack. Among these benchmarks, the list 4/8/16 use structs while modexp safe/unsafe involve bitvector operations, both of which are not supported by Spacer, and thus not by our Ifc-CEGAR prototype.

### **6.3 IFC-CEGAR Results**

Table 1 shows the Ifc-CEGAR results on benchmark examples with varying parameter values. The columns show the time taken by eager self-composition (Eager SC) and Ifc-CEGAR, and the number of refinements in Ifc-CEGAR. "TO" denotes a timeout of 300 s.

We note that all examples are secure and do not leak information. Since our path-sensitive symbolic taint analysis is more precise than a type-based taint analysis, there are few counterexamples and refinements. In particular, for our first example pwdcheck safe, self-composition is not required as our path-sensitive taint analysis is able to prove that no taint propagates to the variables of interest. It is important to note that type-based taint analysis cannot prove that this example is secure. For our second example, pwdcheck2 safe, our path-sensitive taint analysis is not enough. Namely, it finds a counterexample, due to an implicit flow where a for-loop is conditioned on a tainted value, but there is no real leak because the loop executes a constant number of times.

<sup>2</sup> http://www.cs.princeton.edu/∼aartig/benchmarks/ifc bench.zip.


**Table 1.** Ifc-CEGAR results (time in seconds)

Our refinement-based approach can easily handle this case, where Ifc-CEGAR uses self-composition to find that the counterexample is spurious. It then refines the taint analysis model, and after one refinement step, it is able to prove that pwdcheck2 safe is secure. While these examples are fairly small, they clearly show that Ifc-CEGAR is superior to eager self-composition.

### **6.4 IFC-BMC Results**

The experimental results for Ifc-BMC are shown in Table 2, where we use some unsafe versions of benchmark examples as well. Results are shown for total time taken by eager self-composition (Eager SC) and the Ifc-BMC algorithm. (As before, "TO" denotes a timeout of 300 s.) Ifc-BMC is able to produce an answer significantly faster than eager self-composition for all examples. The last two columns show the time spent in taint checks in Ifc-BMC, and the number of taint checks performed.


**Table 2.** Ifc-BMC results (time in seconds)

To study the scalability of our prototype, we tested Ifc-BMC on the modular exponentiation program with different values for the maximum size of the integer array in the program. These results are shown in Table 3. Although the Ifc-BMC runtime grows exponentially, it is reasonably fast – less than 2 min for an array of size 64.

# **7 Related Work**

A rich body of literature has studied the verification of secure information flow in programs. Initial work dates back to Denning and Denning [16], who introduced a program analysis to ensure that confidential data does not flow to non-confidential outputs. This notion of confidentiality relates closely to: (i) non-interference introduced by Goguen and Meseguer [17], and (ii) separability introduced by Rushby [27]. Each of these study a notion of secure information flow where confidential data is strictly not allowed to flow to any non-confidential output. These definitions are often too restrictive for practical programs, where secret data might sometimes be allowed to flow to some non-secret output (e.g., if the data is encrypted before output), i.e. they require declassification [29]. Our approach allows easy and fine-grained de-classification.

A large body of work has also studied the use of type systems that ensure secure information flow. Due to a lack of space, we review a few exemplars and refer the reader to Sabelfeld and Myers [28] for a detailed survey. Early work in this area dates back to Volpano et al. [35] who introduced a type system that maintains secure information based on the work of Denning and Denning [16]. Myers introduced the JFlow programming language (later known as Jif: Java information flow) [25] which extended Java with security types. Jif has been used to build clean slate, secure implementations of complex end-to-end systems, e.g. the Civitas [10] electronic voting system. More recently, Patrigiani et al. [26] introduced the Java Jr. language which extends Java with a security type system, automatically partitions the program into secure and non-secure parts and executes the secure parts inside so-called protected module architectures. In


**Table 3.** Ifc-BMC results on modexp (time in seconds)

contrast to these approaches, our work can be applied to existing security-critical code in languages like C with the addition of only a few annotations.

A different approach to verifying secure information flow is the use of dynamic taint analysis (DTA) [3,12,13,21,30,31] which instruments a program with taint variables and taint tracking code. Advantages of DTA are that it is scalable to very large applications [21], can be accelerated using hardware support [13], and tracks information flow across processes, applications and even over the network [12]. However, taint analysis necessarily involves imprecision and in practice leads to both false positives and false negatives. False positives arise because taint analysis is an overapproximation. Somewhat surprisingly, false negatives are also introduced because tracking implicit flows using taint analysis leads to a deluge of false-positives [30], thus causing practical taint tracking systems to ignore implicit flows. Our approach does not have this imprecision.

Our formulation of secure information flow is based on the self-composition construction proposed by Barthe et al. [5]. A specific type of self-composition called product programs was considered by Barthe et al. [4], which does not allow control flow divergence between the two programs. In general this might miss certain bugs as it ignores implicit flows. However, it is useful in verifying cryptographic code which typically has very structured control flow. Almeida et al. [1] used the product construction to verify that certain functions in cryptographic libraries execute in constant-time.

Terauchi and Aiken [33] generalized self-composition to consider k-safety, which uses <sup>k</sup> <sup>−</sup> 1 compositions of a program with itself. Note that selfcomposition is a 2-safety property. An automated verifier for k-safety properties of Java programs based on Cartesian Hoare Logic was proposed by Sousa and Dillig [32]. A generalization of Cartesian Hoare Logic, called Quantitative Cartesian Hoare Logic was introduced by Chen et al. [8]; the latter can also be used to reason about the execution time of cryptographic implementations. Among these efforts, our work is mostly closely related to that of Terauchi and Aiken [33], who used a type-based analysis as a preprocessing step to self-composition. We use a similar idea, but our taint analysis is more precise due to being path-sensitive, and it is used within an iterative CEGAR loop. Our path-sensitive taint analysis leads to fewer counterexamples and thereby cheaper self-composition, and our refinement approach can easily handle examples with benign branches. In contrast to the other efforts, our work uses lazy instead of eager self-composition, and is thus more scalable, as demonstrated in our evaluation. A recent work [2] also employs trace-based refinement in security verification, but it does not use self-composition.

Our approach has some similarities to other problems related to tainting [19]. In particular, *Change-Impact Analysis* is the problem of determining what parts of a program are affected due to a change. Intuitively, it can be seen as a form of taint analysis, where the change is treated as taint. To solve this, Gyori et al. [19] propose a combination of an imprecise type-based approach with a precise semantics-preserving approach. The latter considers the program before and after the change and finds relational equivalences between the two versions. These are then used to strengthen the type-based approach. While our work has some similarities, there are crucial differences as well. First, our taint analysis is not type-based, but is path-sensitive and preserves the correctness of the defined abstraction. Second, our lazy self-composition is a form of an abstraction-refinement framework, and allows a tighter integration between the imprecise (taint) and precise (self-composition) models.

### **8 Conclusions and Future Work**

A well-known approach for verifying secure information flow is based on the notion of self-composition. In this paper, we have introduced a new approach for this verification problem based on *lazy self-composition*. Instead of eagerly duplicating the program, lazy self-composition uses a synergistic combination of symbolic taint analysis (on a single copy program) and self-composition by duplicating relevant parts of the program, depending on the result of the taint analysis. We presented two instances of lazy self-composition: the first uses taint analysis and self-composition in a CEGAR loop; the second uses bounded model checking to dynamically query taint checks and self-composition based on the results of these dynamic checks. Our algorithms have been implemented in the SeaHorn verification platform and results show that lazy self-composition is able to verify many instances not verified by eager self-composition.

In future work, we are interested in extending lazy self-composition to support learning of quantified relational invariants. These invariants are often required when reasoning about information flow in shared data structures of unbounded size (e.g., unbounded arrays, linked lists) that contain both highand low-security data. We are also interested in generalizing lazy self-composition beyond information-flow to handle other k-safety properties like injectivity, associativity and monotonicity.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **SCINFER: Refinement-Based Verification of Software Countermeasures Against Side-Channel Attacks**

Jun Zhang1, Pengfei Gao1, Fu Song1(B) , and Chao Wang2

<sup>1</sup> ShanghaiTech University, Shanghai, China songfu@shanghaitech.edu.cn <sup>2</sup> University of Southern California, Los Angeles, CA, USA

**Abstract.** Power side-channel attacks, capable of deducing secret using statistical analysis techniques, have become a serious threat to devices in cyber-physical systems and the Internet of things. Random masking is a widely used countermeasure for removing the statistical dependence between secret data and sidechannel leaks. Although there are techniques for verifying whether software code has been perfectly masked, they are limited in accuracy and scalability. To bridge this gap, we propose a refinement-based method for verifying masking countermeasures. Our method is more accurate than prior syntactic type inference based approaches and more scalable than prior model-counting based approaches using SAT or SMT solvers. Indeed, it can be viewed as a gradual refinement of a set of semantic type inference rules for reasoning about distribution types. These rules are kept *abstract* initially to allow fast deduction, and then made *concrete* when the abstract version is not able to resolve the verification problem. We have implemented our method in a tool and evaluated it on cryptographic benchmarks including AES and MAC-Keccak. The results show that our method significantly outperforms state-of-the-art techniques in terms of both accuracy and scalability.

# **1 Introduction**

Cryptographic algorithms are widely used in embedded computing devices, including SmartCards, to form the backbone of their security mechanisms. In general, security is established by assuming that the adversary has access to the input and output, but not internals, of the implementation. Unfortunately, in practice, attackers may recover cryptographic keys by analyzing physical information leaked through side channels. These so-called *side-channel attacks* exploit the statistical dependence between secret data and non-functional properties of a computing device such as the execution time [38], power consumption [39] and electromagnetic radiation [49]. Among them, *di*ff*erential power analysis* (DPA) is an extremely popular and effective class of attacks [30,42].

c The Author(s) 2018 H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 157–177, 2018.

This work was supported primarily by the National Natural Science Foundation of China (NSFC) grants 61532019 and 61761136011. Chao Wang was supported by the U.S. National Science Foundation (NSF) grant CNS-1617203.

https://doi.org/10.1007/978-3-319-96142-2\_12

**Fig. 1.** Overview of SCInfer, where "ICR" denotes the intermediate computation result.

To thwart DPA attacks, *masking* has been proposed to break the statistical dependence between secret data and side-channel leaks through randomization. Although various masked implementations have been proposed, e.g., for AES or its non-linear components (S-boxes) [15,37,51,52], checking if they are correct is always tedious and error-prone. Indeed, there are published implementations [51,52] later shown to be incorrect [21,22]. Therefore, formally verifying these countermeasures is important.

Previously, there are two types of verification methods for masking countermeasures [54]: one is type inference based [10,44] and the other is model counting based [26,27]. Type inference based methods [10,44] are fast and sound, meaning they can quickly prove the computation is leakage free, e.g., if the result is syntactically independent of the secret data or has been masked by random variables not used elsewhere. However, syntactic type inference is *not* complete in that it may report *false positives*. In contrast, model counting based methods [26,27] are sound and complete: they check if the computation is statistically independent of the secret [15]. However, due to the inherent complexity of model counting, they can be extremely slow in practice.

The aforementioned gap, in terms of both accuracy and scalability, has not been bridged by more recent approaches [6,13,47]. For example, Barthe et al. [6] proposed some inference rules to prove masking countermeasures based on the observation that certain operators (e.g., XOR) are *invertible*: in the absence of such operators, purely algebraic laws can be used to normalize expressions of computation results to apply the rules of invertible functions. This normalization is applied to each expression once, as it is costly. Ouahma et al. [47] introduced a linear-time algorithm based on finer-grained syntactical inference rules. A similar idea was explored by Bisi et al. [13] for analyzing higher-order masking: like in [6,47], however, the method is not complete, and does not consider non-linear operators which are common in cryptographic software.

**Our Contribution.** We propose a refinement based approach, named SCInfer, to bridge the gap between prior techniques which are either fast but inaccurate or accurate but slow. Figure 1 depicts the overall flow, where the input consists of the program and a set of variables marked as *public*, *private*, or *random*. We first transform the program to an intermediate representation: the data dependency graph (DDG). Then, we traverse the DDG in a topological order to infer a *distribution type* for each intermediate computation result. Next, we check if all intermediate computation results are perfectly masked according to their types. If any of them cannot be resolved in this way, we invoke an SMT solver based refinement procedure, which leverages either satisfiability (SAT) solving or model counting (SAT#) to prove leakage freedom. In both cases, the result is fed back to improve the type system. Finally, based on the refined type inference rules, we continue to analyze other intermediate computation results.

Thus, SCInfer can be viewed as a synergistic integration of a semantic rule based approach for inferring *distribution types* and an SMT solver based approach for refining these inference rules. Our type inference rules (Sect. 3) are inspired by Barthe et al. [6] and Ouahma et al. [47] in that they are designed to infer distribution types of intermediate computation results. However, there is a crucial difference: their inference rules are syntactic with fixed accuracy, i.e., relying solely on structural information of the program, whereas ours are *semantic* and the accuracy can be gradually improved with the aid of our SMT solver based analysis. At a high level, our semantic type inference rules subsume their syntactic type inference rules.

The main advantage of using type inference is the ability to *quickly* obtain sound proofs: when there is no leak in the computation, often times, the type system can produce a proof quickly; furthermore, the result is always conclusive. However, if type inference fails to produce a proof, the verification problem remains unresolved. Thus, to be complete, we propose to leverage SMT solvers to resolve these *left-over* verification problems. Here, solvers are used to check either the satisfiability (SAT) of a logical formula or counting its satisfying solutions (SAT#), the later of which, although expensive, is powerful enough to completely decide if the computation is perfectly masked. Finally, by feeding solver results back to the type inference system, we can gradually improve its accuracy. Thus, overall, the method is both sound and complete.

We have implemented our method in a software tool named SCInfer and evaluated it on publicly available benchmarks [26,27], which implement various cryptographic algorithms such as AES and MAC-Keccak. Our experiments show SCInfer is both effective in obtaining proofs quickly and scalable for handling realistic applications. Specifically, it can resolve most of the verification subproblems using type inference and, as a result, satisfiability (SAT) based analysis needs to be applied to few left-over cases. Only in rare cases, the most heavyweight analysis (SAT#) needs to be invoked.

To sum up, the main contributions of this work are as follows:


The remainder of this paper is organized as follows. After reviewing the basics in Sect. 2, we present our semantic type inference system in Sect. 3 and our refinement method in Sect. 4. Then, we present our experimental results in Sect. 5 and comparison with related work in Sect. 6. We give our conclusions in Sect. 7.

### **2 Preliminaries**

In this section, we define the type of programs considered in this work and then review the basics of side-channel attacks and masking countermeasures.

### **2.1 Probabilistic Boolean Programs**

Following the notation used in [15,26,27], we assume that the program *P* implements a cryptographic function, e.g., *c* ← *P*(*p*, *k*) where *p* is the plaintext, *k* is the secret key and *c* is the ciphertext. Inside *P*, random variable *r* may be used to mask the secret key while maintaining the input-output behavior of *P*. Therefore, *P* may be viewed as a probabilistic program. Since loops, function calls, and branches may be removed via automated rewriting [26,27] and integer variables may be converted to bits, for verification purposes, we assume that *P* is a straight-line probabilistic Boolean program, where each instruction has a unique label and at most two operands.

Let *k* (resp. *r*) be the set of secret (resp. random) bits, *p* the public bits, and *c* the variables storing intermediate results. Thus, the set of variables is *V* = *k* ∪ *r* ∪ *p* ∪ *c*. In addition, the program uses a set op of operators including negation (¬), and (∧), or (∨), and exclusive-or (⊕). A *computation* of *P* is a sequence *c*<sup>1</sup> ← **i**1(*p*, *k*, *r*); ··· ; *cn* ← **i***n*(*p*, *k*, *r*)

**Fig. 2.** An example for masking countermeasure.

where, for each 1 ≤ *i* ≤ *n*, the value of **i***<sup>i</sup>* is expressed in terms of *p*, *k* and *r*. Each random bit in *r* is uniformly distributed in {0, 1}; the sole purpose of using them in *P* is to ensure that *c*1, ··· *cn* are statistically independent of the secret *k*.

**Data Dependency Graph (DDG).** Our internal representation of *P* is a graph G*<sup>P</sup>* = (*N*, *E*, λ), where *N* is the set of nodes, *E* is the set of edges, and λ is a labeling function.


We may use λ1(*l*) = *c* and λ2(*l*) = ◦ to denote the first and second elements of the pair λ(*l*) = (*c*, ◦), respectively. We may also use *l*.lft to denote the left child of *l*, and *l*.rgt to denote the right child if it exists. A subtree rooted at node *l* corresponds to an intermediate computation result. When the context is clear, we may use the following terms exchangeably: a node *l*, the subtree *T* rooted at *l*, and the intermediate computation result *c* = λ1(*l*). Let |*P*| denote the total number of nodes in the DDG.

Figure 2 shows an example where *k* = {*k*}, *r* = {*r*1,*r*2,*r*3}, *c* = {*c*1, *c*2, *c*3, *c*4, *c*5, *c*6} and *p* = ∅. On the left is a program written in a C-like language except that ⊕ denotes XOR and ∧ denotes AND. On the right is the DDG, where

*c*<sup>3</sup> = *c*<sup>2</sup> ⊕ *c*<sup>1</sup> = (*r*<sup>1</sup> ⊕ *r*2) ⊕ (*k* ⊕ *r*2) = *k* ⊕ *r*<sup>1</sup> *c*<sup>4</sup> = *c*<sup>3</sup> ⊕ *c*<sup>2</sup> = ((*r*<sup>1</sup> ⊕ *r*2) ⊕ (*k* ⊕ *r*2)) ⊕ (*r*<sup>1</sup> ⊕ *r*2) = *k* ⊕ *r*<sup>2</sup> *c*<sup>5</sup> = *c*<sup>4</sup> ⊕ *r*<sup>1</sup> = (((*r*<sup>1</sup> ⊕ *r*2) ⊕ (*k* ⊕ *r*2)) ⊕ (*r*<sup>1</sup> ⊕ *r*2)) ⊕ *r*<sup>1</sup> = *k* ⊕ *r*<sup>1</sup> ⊕ *r*<sup>2</sup> *c*<sup>6</sup> = *c*<sup>5</sup> ∧ *r*<sup>3</sup> = ((((*r*<sup>1</sup> ⊕ *r*2) ⊕ (*k* ⊕ *r*2)) ⊕ (*r*<sup>1</sup> ⊕ *r*2)) ⊕ *r*1) ∧ *r*<sup>3</sup> = (*k* ⊕ *r*<sup>1</sup> ⊕ *r*2) ∧ *r*<sup>3</sup>

Let supp : *N* → *k* ∪ *r* ∪ *p* be a function mapping each node *l* to its support variables. That is, supp(*l*) = ∅ if λ1(*l*) ∈ {0, 1}; supp(*l*) = {*x*} if λ1(*l*) = *x* ∈ *k* ∪ *r* ∪ *p*; and supp(*l*) = supp(*l*.lft) ∪ supp(*l*.rgt) otherwise. Thus, the function returns a set of variables that λ1(*l*) depends upon structurally.

Given a node *l* whose corresponding expression *e* is defined in terms of variables in *V*, we say that *e* is semantically dependent on a variable *r* ∈ *V* if and only if there exist two assignments, π<sup>1</sup> and π2, such that π1(*r*) π2(*r*) and π1(*x*) = π2(*x*) for every *x* ∈ *V* \ {*r*}, and the values of *e* differ under π<sup>1</sup> and π2.

Let semd : *N* → *r* be a function such that *semd*(*l*) denotes the set of *random variables* upon which the expression *e* of *l* semantically depends. Thus, semd(*l*) ⊆ supp(*l*); and for each *r* ∈ supp(*l*) \ semd(*l*), we know λ1(*l*) is semantically independent of *r*. More importantly, there is often a gap between supp(*l*) ∩ *r* and semd(*l*), namely semd(*l*) ⊆ supp(*l*) ∩ *r*, which is why our gradual refinement of semantic type inference rules can outperform methods based solely on syntactic type inference.

Consider the node *lc*<sup>4</sup> in Fig. 2: we have supp(*lc*<sup>4</sup> ) = {*r*1,*r*2, *k*}, semd(*lc*<sup>4</sup> ) = {*r*2}, and supp(*lc*<sup>4</sup> ) ∩ *r* = {*r*1,*r*2}. Furthermore, if the random bits are uniformly distributed in {0, 1}, then *c*<sup>4</sup> is both *uniformly distributed* and *secret independent* (Sect. 2.2).

### **2.2 Side-Channel Attacks and Masking**

We assume the adversary has access to the public input *p* and output *c*, but not the secret *k* and random variable *r*, of the program *c* ← *P*(*p*, *k*). However, the adversary may have access to side-channel leaks that reveal the joint distribution of at most *d* intermediate computation results *c*1, ··· *cd* (e.g., via differential power analysis [39]). Under these assumptions, the goal of the adversary is to deduce information of *k*. To model the leakage of each instruction, we consider a widely-used, value-based model, called the Hamming Weight (HW) model; other power leakage models such as the transition-based model [5] can be used similarly [6].

Let [*n*] denote the set {1, ··· , *n*} of natural numbers where *n* ≥ 1. We call a set with *d* elements a *d-set*. Given values (*p*, *k*) for (*p*, *k*) and a *d*-set {*c*1, ··· , *cd*} of intermediate computation results, we use D*p*,*k*(*c*1, ··· *cd*) to denote their joint distribution induced by instantiating *p* and *k* with *p* and *k*, respectively. Formally, for each vector of values *v*1, ··· , *vd* in the probability space {0, 1} *<sup>d</sup>*, we have <sup>D</sup>*p*,*k*(*c*1, ··· *cd*)(*v*1, ··· , *vd*) =

$$\frac{|\{r \in \{0, 1\}^{|r|} \mid \nu\_1 = \mathbf{i}\_1(p = p, k = k, r = r), \dots, \nu\_d = \mathbf{i}\_d(p = p, k = k, r = r)\}|}{2^{|r|}}.$$

**Definition 1.** *We say a d-set* {*c*1, ··· , *cd*} *of intermediate computation results is*

*–* uniformly distributed *if* D*<sup>p</sup>*,*k*(*c*1, ··· , *cd*) *is a uniform distribution for any p and k.*

*–* secret independent *if* D*<sup>p</sup>*,*k*(*c*1, ··· , *cd*) = D*<sup>p</sup>*,*<sup>k</sup>* (*c*1, ··· , *cd*) *for any* (*p*, *k*) *and* (*p*, *k* )*.* Note that there is a difference between them: an uniformly distributed *d*-set is always secret independent, but a secret independent *d*-set is not always uniformly distributed.

**Definition 2.** *A program P is* order-*d* perfectly masked *if every k-set* {*c*1, ··· , *ck*} *of P such that k* ≤ *d is secret independent. When P is (order-*1*) perfectly masked, we may simply say it is perfectly masked.*

To decide if *P* is order-*d* perfectly masked, it suffices to check if there exist a *d*-set and two pairs (*p*, *k*) and (*p*, *k* ) such that D*p*,*k*(*c*1, ··· , *cd*) - D*p*,*<sup>k</sup>* (*c*1, ··· , *cd*). In this context, the main challenge is computing D*p*,*k*(*c*1, ··· , *cd*) which is essentially a *model-counting* (SAT#) problem. In the remainder of this paper, we focus on developing an efficient method for verifying (order-1) perfect masking, although our method can be extended to higher-order masking as well.

**Gap in Current State of Knowledge.** Existing methods for verifying masking countermeasures are either *fast but inaccurate*, e.g., when they rely solely on syntactic type inference (structural information provided by supp in Sect. 2.1) or *accurate but slow*, e.g., when they rely solely on model-counting. In contrast, our method gradually refines a set of semantic type-inference rules (i.e., using semd instead of supp as defined in Sect. 2.1) where constraint solvers (SAT and SAT#) are used on demand to resolve ambiguity and improve the accuracy of type inference. As a result, it can achieve the best of both worlds.

# **3 The Semantic Type Inference System**

We first introduce our distribution types, which are inspired by prior work in [6,13,47], together with some auxiliary data structures; then, we present our inference rules.

# **3.1 The Type System**

Let T = {CST, RUD, SID, NPM, UKD} be the set of distribution types for intermediate computation results, where *c* denotes the type of *c* ← **i**(*p*, *k*, *r*). Specifically,


**Definition 3.** *Let* unq : *N* → *r and* dom : *N* → *r be two functions such that (i) for each terminal node l* ∈ *LV, if* λ1(*l*) ∈ *r, then* unq(*l*) = dom(*l*) = λ1(*l*)*; otherwise* unq(*l*) = dom(*l*) = supp(*l*) = ∅*; and (ii) for each internal node l* ∈ *L, we have*


**Fig. 3.** Our semantic type-inference rules. The NPM type is not yet used here; its inference rules will be added in Fig. 4 since they rely on the SMT solver based analyses.

Both unq(*l*) and dom(*l*) are computable in time that is linear in |*P*| [47]. Following the proofs in [6,47], it is easy to reach this observation: Given an intermediate computation result *c* ← **i**(*p*, *k*, *r*) labeled by *l*, the following statements hold:


Figure 3 shows our type inference rules that concretize these observations. When multiple rules could be applied to a node *l* ∈ *N*, we always choose the rules that can lead to *l* = RUD. If no rule is applicable at *l*, we set *l* = UKD. When the context is clear, we may use *l* and *c* exchangeably for λ1(*l*) = *c*. The correctness of these inference rules is obvious by definition.

**Theorem 1.** *For every intermediate computation result c* ← *i*(*p*, *k*, *r*) *labeled by l,*


To improve efficiency, our inference rules may be applied twice, first using the supp function, which extracts structural information from the program (cf. Sect. 2.1) and then using the semd function, which is slower to compute but also significantly more accurate. Since semd(*l*) ⊆ supp(*l*) for all *l* ∈ *N*, this is always sound. Moreover, type inference is invoked for the second time only if, after the first time, *l* remains UKD.

*Example 1.* When using type inference with supp on the running example, we have

*r*1 = *r*2 = *r*3 = *c*1 = *c*2 = *c*3 = RUD, *k* = *c*4 = *c*5 = *c*6 = UKD

When using type inference with semd (for the second time), we have

$$\mathbb{I}\left\{r\_1\right\| = \left\{r\_2\right\| = \left\{r\_3\right\} = \left\{c\_1\right\} = \left\{c\_2\right\} = \left\{c\_3\right\} = \left\{c\_4\right\} = \left\{c\_5\right\} = \text{RUD}, \\ \left\{k\right\| = \text{URD}, \\ \left\{c\_6\right\| = \text{SLD}$$

### **3.2 Checking Semantic Independence**

Unlike supp(*l*), which only extracts structural information from the program and hence may be computed syntactically, semd(*l*) is more expensive to compute. In this subsection, we present a method that leverages the SMT solver to check, for any intermediate computation result *c* ← **i**(*p*, *k*, *r*) and any random bit *r* ∈ *r*, whether *c* is semantically dependent of *r*. Specifically, we formulate it as a satisfiability (SAT) problem (formula Φ*s*) defined as follows:

$$
\Theta\_s^{r=0}(c\_0, \mathfrak{p}, \mathfrak{k}, \mathfrak{r} \backslash \{r\}) \wedge \Theta\_s^{r=1}(c\_1, \mathfrak{p}, \mathfrak{k}, \mathfrak{r} \backslash \{r\}) \wedge \Theta\_s^{\sharp}(c\_0, c\_1),
$$

where Θ*r*=<sup>0</sup> *<sup>s</sup>* (resp. Θ*r*=<sup>1</sup> *<sup>s</sup>* ) encodes the relation **i**(*p*, *k*, *r*) with *r* replaced by 0 (resp. 1), *c*<sup>0</sup> and *c*<sup>1</sup> are copies of *c* and Θ- *<sup>s</sup>* asserts that the outputs differ even under the same inputs.

In logic synthesis and optimization, when *r* semd(*l*), *r* will be called the *don't care* variable [36]. Therefore, it is easy to see why the following theorem holds.

**Theorem 2.** Φ*<sup>s</sup> is unsatisfiable i*ff *the value of r does not a*ff*ect the value of c, i.e., c is semantically independent of r. Moreover, the formula size of* Φ*<sup>s</sup> is linear in* |*P*|*.*

**Fig. 4.** Our composition rules for handling *sets* of intermediate computation results.

### **3.3 Verifying Higher-Order Masking**

The type system so far targets *first-order* masking. We now outline how it extends to verify higher-order masking. Generally speaking, we have to check, for any *k*-set {*c*1, ··· , *ck*} of intermediate computation results such that *k* ≤ *d*, the joint distribution is either randomized to uniform distribution (RUD) or secret independent (SID).

To tackle this problem, we lift supp, semd, unq, and dom to *sets* of computation results as follows: for each *k*-set {*c*1, ··· , *ck*},


Our inference rules are extended by adding the composition rules shown in Fig. 4.

**Theorem 3.** *For every k-set* {*c*1, ··· , *ck*} *of intermediate computations results,*


We remark that the semd function in these composition rules could also be safely replaced by the supp function, just as before. Furthermore, to more efficiently verify that program *P* is perfect masked against order-*d* attacks, we can incrementally apply the type inference for each *k*-set, where *k* = 1, 2,..., *d*.

### **4 The Gradual Refinement Approach**

In this section, we present our method for gradually refining the type inference system by leveraging SMT solver based techniques. Adding solvers to the sound type system makes it complete as well, thus allowing it to detect side-channel leaks whenever they exist, in addition to proving the absence of such leaks.

#### **4.1 SMT-Based Approach**

For a given computation *c* ← **i**(*p*, *k*, *r*), the verification of perfect masking (Definition 2) can be reduced to the *satisfiability* of the logical formula (Ψ) defined as follows:

$$\exists p. \exists k. \exists k'. (\sum\_{\nu\_r \in (0,1)^{|r|}} \mathbf{i}(p,k,\nu\_r) \neq \sum\_{\nu\_r \in (0,1)^{|r|}} \mathbf{i}(p,k',\nu\_r)).$$

Intuitively, given values (*vp*, *vk*) of (*p*, *k*), *count* = *<sup>v</sup>r*∈{0,1}|*<sup>r</sup>*<sup>|</sup> **i**(*vp*, *vk*, *vr*) denotes the number of assignments of the random variable *r* under which **i**(*vp*, *vk*, *r*) is evaluated to logical 1. When random bits in *<sup>r</sup>* are uniformly distributed in the domain {0, <sup>1</sup>}, *count* <sup>2</sup>|*r*<sup>|</sup> is the probability of **i**(*vp*, *vk*, *r*) being logical 1 for the given pair (*vp*, *vk*). Therefore, Ψ is unsatisfiable if and only if *c* is perfectly masked.

Following Eldib et al. [26,27], we encode the formula Ψ as a quantifier-free firstorder logic formula to be solved by an off-the-shelf SMT solver (e.g., Z3):

$$(\bigwedge\_{r=0}^{2^{|\mathcal{r}|-1}} \Theta\_k^r) \wedge (\bigwedge\_{r=0}^{2^{|\mathcal{r}|-1}} \Theta\_{k'}^r) \wedge \Theta\_{b2i} \wedge \Theta\_\neq$$


*Example 2.* In the running example, for instance, verifying whether node *c*<sup>4</sup> is perfectly masked requires the SMT-based analysis. For brevity, we omit the detailed logical formula while pointing out that, by invoking the SMT solver six times, one can get the following result: *c*1 = *c*2 = *c*3 = *c*4 = *c*5 = *c*6 = SID.

**Fig. 5.** Complementary rules used during refinement of the type inference (Fig. 3).

Although the SMT formula size is linear in |*P*|, the number of distinct copies is exponential of the number of random bits used in the computation. Thus, the approach cannot be applied to large programs. To overcome the problem, incremental algorithms [26,27] were proposed to reduce the formula size using partitioning and heuristic reduction.

**Incremental SMT-Based Algorithm.** Given a computation *c* ← **i**(*p*, *k*, *r*) that corresponds to a subtree *T* rooted at *l* in the DDG, we search for an internal node *ls* in *T* (a *cut-point*) such that dom(*ls*) ∩ unq(*l*) - ∅. A cut-point is *maximal* if there is no other cut-point from *l* to *ls*. Let *T* be the *simplified tree* obtained from *<sup>T</sup>* by replacing every subtree rooted by a maximal cut-point with a random variable from dom(*ls*) ∩ unq(*l*). Then, *T* is SID <sup>i</sup>ff *<sup>T</sup>* is SID.

The main observation is that: if *ls* is a cut-point, there is a random variable *r* ∈ dom(*ls*) ∩ unq(*l*), which implies λ1(*ls*) is RUD. Here, *r* ∈ unq(*l*) implies λ1(*ls*) can be seen as a *fresh* random variable when we evaluate *l*. Consider the node *c*<sup>3</sup> in our running example: it is easy to see *r*<sup>1</sup> ∈ dom(*c*2)∩unq(*c*3). Therefore, for the purpose of verifying *c*3, the entire subtree rooted at *c*<sup>2</sup> can be replaced by the random variable *r*1.

In addition to partitioning, heuristics rules [26,27] can be used to simplify SMT solving. (1) When constructing formula Φ of *c*, all random variables in supp(*l*)\semd(*l*), which are *don't cares*, can be replaced by constant 1 or 0. (2) The No-Key and Sid rules in Fig. 3 with the supp function are used to skip some checks by SMT.

*Example 3.* When applying incremental SMT-based approach to our running example, *c*<sup>1</sup> has to be decided by SMT, but *c*<sup>2</sup> is skipped due to No-Key rule.

As for *c*3, since *r*<sup>1</sup> ∈ dom(*c*2)∩unq(*c*3), *c*<sup>2</sup> is a cut-point and the subtree rooted at *c*<sup>2</sup> can be replaced by *r*1, leading to the simplified computation *r*<sup>1</sup> ⊕(*r*<sup>2</sup> ⊕*k*) – subsequently it is skipped by the Sid rule with supp. Note that the above Sid rule is not applicable to the original subtree, because *r*<sup>2</sup> occurs in the support of both children of *c*3.

There is no cut-point for *c*4, so it is checked using the SMT solver. But since *c*<sup>4</sup> is semantically independent of *r*<sup>1</sup> (a *don't care* variable), to reduce the SMT formula size, we replace *r*<sup>1</sup> by 1 (or 0) when constructing the formula Φ.

### **4.2 Feeding SMT-Based Analysis Results Back to Type System**

Consider a scenario where initially the type system (cf. Sect. 3) failed to resolve a node *l*, i.e., *l* = UKD, but the SMT-based approach resolved it as either NPM or SID. Such results should be *fed back* to improve the type system, which may lead to the following two favorable outcomes: (1) marking more nodes as perfectly masked (RUD or SID) and (2) marking more nodes as leaky (NPM), which means we can avoid expensive SMT calls for these nodes. More specifically, if SMT-based analysis shows that *l* is perfectly

masked, the type of *l* can be refined to *l* = SID; feeding it back to the type system allows us to infer more types for nodes that structurally depend on *l*.

On the other hand, if SMT-based analysis shows *l* is not perfectly masked, the type of *l* can be refined to *l* = NPM; feeding it back allows the type system to infer that other nodes may be NPM as well. To achieve what is outlined in the second case above, we add the NPM-related type inference rules shown in Fig. 5. When they are added to the type system outlined in Fig. 3, more NPM type nodes will be deduced, which allows our method to skip the (more expensive) checking of NPM using SMT.

*Example 4.* Consider the example DDG in Fig. 6. By applying the original type inference approach with either supp or semd, we have

$$\mathbb{I}\left[c\_1\right] = \left\llbracket c\_4\right\rrbracket = \text{RUD}, \,\left\llbracket c\_2\right\rrbracket = \left\llbracket c\_3\right\rrbracket = \left\llbracket c\_6\right\rrbracket = \text{SLD}, \,\left\llbracket c\_5\right\rrbracket = \left\llbracket c\_7\right\rrbracket = \text{URD}.$$

In contrast, by applying SMT-based analysis to *c*5, we can deduce *c*5 = SID. Feeding *c*5 = SID back to the original type system, and then applying the Sid rule to *c*<sup>7</sup> = *c*<sup>5</sup> ⊕ *c*6, we are able to deduce *c*7 = SID. Without refinement, this was not possible.

### **4.3 The Overall Algorithm**

Having presented all the components, we now present the overall procedure, which integrates the semantic type system and SMT-based method for gradual refinement. Algorithm 1 shows the pseudo code. Given the program *P*, the sets of public (*p*), secret (*k*), random (*r*) variables and an empty map π, it invokes SCInfer(*P*, *p*, *k*, *r*, π) to traverse the DDG in a topological order and annotate every node *l* with a distribution type from T. The subroutine TypeInfer implements the type inference rules outlined in Figs. 3 and 5, where the parameter *f* can be either supp or semd.

SCInfer first deduces the type of each node *<sup>l</sup>* <sup>∈</sup> *<sup>N</sup>* by invoking TypeInfer with *<sup>f</sup>* <sup>=</sup> supp. Once a node *<sup>l</sup>* is annotated as UKD, a simplified subtree *<sup>P</sup>* of the subtree rooted at *l* is constructed. Next, TypeInfer with *f* = semd is invoked to resolve the UKD node in *P* . If <sup>π</sup>(*l*) becomes non-UKD afterward, TypeInfer with *<sup>f</sup>* <sup>=</sup> supp is invoked again to quickly deduce the types of the fan-out nodes in *P*. But if π(*l*) remains UKD, SCInfer invokes the incremental SMT-based approach to decide whether *l* is either SID or NPM. This is sound and complete, unless the SMT solver runs out of time/memory, in which case UKD is assigned to *l*.

### **Algorithm 1.** Function SCInfer(*P*, *p*, *k*, *r*, π)


**Theorem 4.** *For every intermediate computation result c* ← *i*(*p*, *k*, *r*) *labeled by l, our method in* SCInfer *guarantees to return sound and complete results:*


If timeout or memory out is used to bound the execution of the SMT solver, it is also possible that π(*l*) = UKD, meaning *c* has an unknown distribution (it may or may not be perfectly masked). It is interesting to note that, if we regard UKD as *potential leak* and at the same time. bound (or even disable) SMT-based analysis, Algorithm 1 degenerates to a *sound* type system that is both fast and potentially accurate.

# **5 Experiments**

We have implemented our method in a verification tool named SCInfer, which uses Z3 [23] as the underlying SMT solver. We also implemented the syntactic type inference approach [47] and the incremental SMT-based approach [26,27] in the same tool for experimental comparison purposes. We conducted experiments on publicly available cryptographic software implementations, including fragments of AES and MAC-Keccak [26,27]. Our experiments were conducted on a machine with 64-bit Ubuntu 12.04 LTS, Intel Xeon(R) CPU E5-2603 v4, and 32 GB RAM.

Overall, results of our experiments show that (1) SCInfer is significantly more accurate than prior syntactic type inference method [47]; indeed, it solved tens of thousand of UKD cases reported by the prior technique; (2) SCInfer is at least twice faster than prior SMT-based verification method [26,27] on the large programs while maintaining the same accuracy; for example, SCInfer verified the benchmark named P12 in a few seconds whereas the prior SMT-based method took more than an hour.

**Algorithm 2.** Procedure TypeInfer(*l*, *P*, *p*, *k*, *r*, π, *f*) **<sup>1</sup> Procedure** TypeInfer(*l*, *P*, *p*, *k*, *r*, π, *f* ) **<sup>2</sup> if** λ2(*l*) = ¬ **then** π(*l*) := π(*l*.lft) ; **<sup>3</sup> else if** λ2(*l*) = ⊕ **then <sup>4</sup> if** π(*l*.lft) = RUD ∧ dom(*l*.lft) \ *f*(*l*.rgt) - ∅ **then** π(*l*) := RUD; **<sup>5</sup> else if** π(*l*.rgt) = RUD ∧ dom(*l*.rgt) \ *f*(*l*.lft) - ∅ **then** π(*l*) := RUD; **<sup>6</sup> else if** π(*l*.rgt) = π(*l*.lft) = SID ∧ *f*(*l*.lft) ∩ *f*(*l*.rgt) ∩ *r* = ∅ **then <sup>7</sup>** π(*l*) := SID **<sup>8</sup> else if** supp(*l*) ∩ *k* = ∅ **then** π(*l*) := SID; **<sup>9</sup> else** π(*l*) := UKD; **<sup>10</sup> else <sup>11</sup> if** ⎛ ⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ (π(*l*.lft) = RUD ∧ π(*l*.rgt) {UKD, NPM})∨ (π(*l*.rgt) = RUD ∧ π(*l*.lft) {UKD, NPM}) ∧*f*(*l*.lft) ∩ *f*(*l*.rgt) ∩ *r* = ∅ ⎞ ⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ **then** π(*l*) := SID; **<sup>12</sup> else if** (dom(*l*.rgt) \ *f*(*l*.lft)) ∪ (dom(*l*.lft) \ *f*(*l*.rgt)) - ∅ ∧π(*l*.lft) = RUD ∧ π(*l*.rgt) = RUD **then <sup>13</sup>** π(*l*) := SID **<sup>14</sup> else if** ⎛ ⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ (π(*l*.lft) = RUD ∧ π(*l*.rgt) = NPM)∨ (π(*l*.rgt) = RUD ∧ π(*l*.lft) = NPM) ∧*f*(*l*.lft) ∩ *f*(*l*.rgt) ∩ *r* = ∅ ⎞ ⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ **then** π(*l*) := NPM; **<sup>15</sup> else if** (π(*l*.lft) = RUD ∧ π(*l*.rgt) = NPM ∧ dom(*l*.lft) \ *f*(*l*.rgt) - ∅)∨ (π(*l*.rgt) = RUD ∧ π(*l*.lft) = NPM ∧ dom(*l*.rgt) \ *f*(*l*.lft) - ∅) **then <sup>16</sup>** π(*l*) := NPM **<sup>17</sup> else if** *(*π(*l*.lft) = π(*l*.rgt) = SID) ∧ *f*(*l*.lft) ∩ *f*(*l*.rgt) ∩ *r* = ∅ **then <sup>18</sup>** π(*l*) := SID **<sup>19</sup> else if** supp(*l*) ∩ *k* = ∅ **then** π(*l*) := SID; **<sup>20</sup> else** π(*l*) := UKD;

### **5.1 Benchmarks**

Table 1 shows the detailed statistics of the benchmarks, including seventeen examples (P1–P17), all of which have nonlinear operations. Columns 1 and 2 show the name of the program and a short description. Column 3 shows the number of instructions in the probabilistic Boolean program. Column 4 shows the number of DDG nodes denoting intermediate computation results. The remaining columns show the number of bits in the secret, public, and random variables, respectively. Remark that the number of random variables in each computation is far less than the one of the program. All these programs are transformed into Boolean programs where each instruction has at most two operands. Since the statistics were collected from the transformed code, they may have minor differences from statistics reported in prior work [26,27].

In particular, P1–P5 are masking examples originated from [10], P6–P7 are originated from [15], P8–P9 are the MAC-Keccak computation reordered examples originated from [11], P10–P11 are two experimental masking schemes for the Chi function in MAC-Keccak. Among the larger programs, P12–P17 are the regenerations of


### **Table 1.** Benchmark statistics.

MAC-Keccak reference code submitted to the SHA-3 competition held by NIST, where P13–P16 implement the masking of Chi functions using different masking schemes and P17 implements the de-masking of Pi function.

### **5.2 Experimental Results**

We compared the performance of SCInfer, the purely syntactic type inference method (denoted Syn. Infer) and the incremental SMT-based method (denoted by SMT App). Table 2 shows the results. Column 1 shows the name of each benchmark. Column 2 shows whether it is perfectly masked (ground truth). Columns 3–4 show the results of the purely syntactic type inference method, including the number of nodes inferred as UKD type and the time in seconds. Columns 5–7 (resp. Columns 8–10) show the results of the incremental SMT-based method (resp. our method SCInfer), including the number of leaky nodes (NPM type), the number of nodes actually checked by SMT, and the time.

Compared with syntactic type inference method, our approach is significantly more accurate (e.g., see P4, P5 and P15). Furthermore, the time taken by both methods are comparable on small programs. On the large programs that are not perfectly masked (i.e., P13–P17), our method is slower since SCInfer has to resolve the UKD nodes reported by syntactic inference by SMT. However, it is interesting to note that, on the perfectly masked large program (P12), our method is faster.

Moreover, the UKD type nodes in P4, reported by the purely syntactic type inference method, are all proved to be perfectly masked by our semantic type inference system,


**Table 2.** Experimental results: comparison of three approaches.

without calling the SMT solver at all. As for the three UKD type nodes in P5, our method proves them all by invoking the SMT solver only twice; it means that the feedback of the new SID types (discovered by SMT) allows our type system to improve its accuracy, which turns the third UKD node to SID.

Finally, compared with the original SMT-based approach, our method is at least twice faster on the large programs (e.g., P12–P17). Furthermore, the number of nodes actually checked by invoking the SMT solver is also lower than in the original SMTbased approach (e.g., P4–P5, and P17). In particular, there are 3200 UKD type nodes in P17, which are refined into NPM type by our new inference rules (cf. Fig. 5), and thus avoid the more expensive SMT calls.

To sum up, results of our experiments show that: SCInfer is fast in obtaining proofs in perfectly-masked programs, while retaining the ability to detect real leaks in notperfectly-masked programs, and is scalable for handling realistic applications.

#### **5.3 Detailed Statistics**

Table 3 shows the more detailed statistics of our approach. Specifically, Columns 2–5 show the number of nodes in each distribution type deduced by our method. Column 6 shows the number of nodes actually checked by SMT, together with the time shown in Column 9. Column 7 shows the time spent on computing the semd function, which solves the SAT problem. Column 8 shows the time spent on computing the don't care variables. The last column shows the total time taken by SCInfer.


**Table 3.** Detailed statistics of our new method.

Results in Table 3 indicate that most of the DDG nodes in these benchmark programs are either RUD or SID, and almost all of them can be quickly deduced by our type system. It explains why our new method is more efficient than the original SMT-based approach. Indeed, the original SMT-based approach spent a large amount of time on the static analysis part, which does code partitioning and applies the heuristic rules (cf. Sect. 4.1), whereas our method spent more time on computing the semd function.

Column 4 shows that, at least in these benchmark programs, Boolean constants are rare. Columns 5–6 show that, if our refined type system fails to prove perfect masking, it is usually not perfectly masked. Columns 7–9 show that, in our integrated method, most of the time is actually used to compute semd and don't care variables (SAT), while the time taken by the SMT solver to conduct model counting (SAT#) is relatively small.

### **6 Related Work**

Many masking countermeasures [15,17,34,37,41,43,46,48,50–52] have been published over the years: although they differ in adversary models, cryptographic algorithms and compactness, a common problem is the lack of efficient tools to formally prove their correctness [21,22]. Our work aims to bridge the gap. It differs from simulation-based techniques [3,33,53] which aim to detect leaks only as opposed to prove their absence. It also differs from techniques designed for other types of side channels such as timing [2,38], fault [12,29] and cache [24,35,40], or computing security bounds for probabilistic countermeasures against remote attacks [45].

Although some verification tools have been developed for this application [6,7,10, 13,14,20,26,27,47], they are either fast but inaccurate (e.g., type-inference techniques) or accurate but slow (e.g., model-counting techniques). For example, Bayrak et al. [10] developed a leak detector that checks if a computation result is *logically* dependent of the secret and, at the same time, *logically* independent of any random variable. It is fast but not accurate in that many leaky nodes could be incorrectly proved [26,27]. In contrast, the model-counting based method proposed by Eldib et al. [26–28] is accurate, but also significantly less scalable because the size of logical formulas they need to build are exponential in the number of random variables. Moreover, for higher-order masking, their method is still not complete.

Our gradual refinement of a set of semantic type inference rules were inspired by recent work on proving probabilistic non-interference [6,47], which exploit the unique characteristics of invertible operations. Similar ideas were explored in [7,14,20] as well. However, these prior techniques differ significantly from our method because their type-inference rules are syntactic and fixed, whereas ours are semantic and refined based on SMT solver based analysis (SAT and SAT#). In terms of accuracy, numerous unknowns occurred in the experimental results of [47] and two obviously perfect masking cases were not proved in [6]. Finally, although higher-order masking were addressed by prior techniques [13], they were limited to linear operations, whereas our method can handle both first-order and higher-order masking with non-linear operations.

An alternative way to address the model-counting problem [4,18,19,32] is to use satisfiability modulo counting, which is a generalization of the satisfiability problem of SMT extended with counting constraints [31]. Toward this end, Fredrikson and Jha [31] have developed an efficient decision procedure for linear integer arithmetic (LIA) based on Barvinok's algorithm [8] and also applied their approach to differential privacy.

Another related line of research is automatically synthesizing countermeasures [1, 7,9,16,25,44,54] as opposed to verifying them. While methods in [1,7,9,44] rely on compiler-like pattern matching, the ones in [16,25,54] use inductive program synthesis based on the SMT approach. These emerging techniques, however, are orthogonal to our work reported in this paper. It would be interesting to investigate whether our approach could aid in the synthesis of masking countermeasures.

### **7 Conclusions and Future Work**

We have presented a refinement based method for proving that a piece of cryptographic software code is free of power side-channel leaks. Our method relies on a set of semantic inference rules to reason about distribution types of intermediate computation results, coupled with an SMT solver based procedure for gradually refining these types to increase accuracy. We have implemented our method and demonstrated its efficiency and effectiveness on cryptographic benchmarks. Our results show that it outperforms state-of-the-art techniques in terms of both efficiency and accuracy.

For future work, we plan to evaluate our type inference systems for higher-order masking, extend it to handle integer programs as opposed to bit-blasting them to Boolean programs, e.g., using satisfiability modulo counting [31], and investigate the synthesis of masking countermeasures based on our new verification method.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Symbolic Algorithms for Graphs and Markov Decision Processes with Fairness Objectives**

Krishnendu Chatterjee1(B), Monika Henzinger<sup>2</sup>, Veronika Loitzenbauer<sup>3</sup>, Simin Oraee<sup>4</sup>, and Viktor Toman<sup>1</sup>

 IST Austria, Klosterneuburg, Austria krish.chat@gmail.com University of Vienna, Vienna, Austria Johannes Kepler University Linz, Linz, Austria Max Planck Institute for Software Systems, Kaiserslautern, Germany

**Abstract.** Given a model and a specification, the fundamental modelchecking problem asks for algorithmic verification of whether the model satisfies the specification. We consider graphs and Markov decision processes (MDPs), which are fundamental models for reactive systems. One of the very basic specifications that arise in verification of reactive systems is the strong fairness (aka Streett) objective. Given different types of requests and corresponding grants, the objective requires that for each type, if the request event happens infinitely often, then the corresponding grant event must also happen infinitely often. All ω-regular objectives can be expressed as Streett objectives and hence they are canonical in verification. To handle the state-space explosion, symbolic algorithms are required that operate on a succinct implicit representation of the system rather than explicitly accessing the system. While explicit algorithms for graphs and MDPs with Streett objectives have been widely studied, there has been no improvement of the basic symbolic algorithms. The worstcase numbers of symbolic steps required for the basic symbolic algorithms are as follows: quadratic for graphs and cubic for MDPs. In this work we present the first sub-quadratic symbolic algorithm for graphs with Streett objectives, and our algorithm is sub-quadratic even for MDPs. Based on our algorithmic insights we present an implementation of the new symbolic approach and show that it improves the existing approach on several academic benchmark examples.

# **1 Introduction**

In this work we present faster symbolic algorithms for graphs and Markov decision processes (MDPs) with strong fairness objectives. For the fundamental model-checking problem, the input consists of a *model* and *a specification*, and the algorithmic verification problem is to check whether the model *satisfies* the specification. We first describe the specific model-checking problem we consider and then our contributions.

*Models: Graphs and MDPs.* Two standard models for reactive systems are graphs and Markov decision processes (MDPs). Vertices of a graph represent states of a reactive system, edges represent transitions of the system, and infinite paths of the graph represent non-terminating trajectories of the reactive system. MDPs extend graphs with probabilistic transitions that represent reactive systems with uncertainty. Thus graphs and MDPs are the de-facto model of reactive systems with nondeterminism, and nondeterminism with stochastic aspects, respectively [3,19].

*Specification: Strong Fairness (aka Streett) Objectives.* A basic and fundamental property in the analysis of reactive systems is the *strong fairness condition*, which informally requires that if events are enabled infinitely often, then they must be executed infinitely often. More precisely, the strong fairness conditions (aka Streett objectives) consist of k types of requests and corresponding grants, and the objective requires that for each type if the request happens infinitely often, then the corresponding grant must also happen infinitely often. After safety, reachability, and liveness, the strong fairness condition is one of the most standard properties that arise in the analysis of reactive systems, and chapters of standard textbooks in verification are devoted to it (e.g., [19, Chap. 3.3], [32, Chap. 3], [2, Chaps. 8, 10]). Moreover, all ω-regular objectives can be described by Streett objectives, e.g., LTL formulas and non-deterministic ω-automata can be translated to deterministic Streett automata [34] and efficient translation has been an active research area [16,23,28]. Thus Streett objectives are a canonical class of objectives that arise in verification.

*Satisfaction.* The basic notions of satisfaction for graphs and MDPs are as follows: For graphs the notion of satisfaction requires that there is a trajectory (infinite path) that belongs to the set of paths described by the Streett objective. For MDPs the satisfaction requires that there is a policy to resolve the nondeterminism such that the Streett objective is ensured almost-surely (with probability 1). Thus the algorithmic model-checking problem of graphs and MDPs with Streett objectives is a core problem in verification.

*Explicit vs Symbolic Algorithms.* The traditional algorithmic studies consider *explicit* algorithms that operate on the explicit representation of the system. In contrast, *implicit* or *symbolic* algorithms only use a set of predefined operations and do not explicitly access the system [20]. The significance of symbolic algorithms in verification is as follows: to combat the state-space explosion, large systems must be succinctly represented implicitly and then symbolic algorithms are scalable, whereas explicit algorithms do not scale as it is computationally too expensive to even explicitly construct the system.

*Relevance.* In this work we study symbolic algorithms for graphs and MDPs with Streett objectives. Symbolic algorithms for the analysis of graphs and MDPs are at the heart of many state-of-the-art tools such as SPIN, NuSMV for graphs [18,27] and PRISM, LiQuor, Storm for MDPs [17,22,29]. Our contributions are related to the algorithmic complexity of graphs and MDPs with Streett objectives for symbolic algorithms. We first present previous results and then our contributions.

*Previous Results.* The most basic algorithm for the problem for graphs is based on repeated SCC (strongly connected component) computation, and informally can be described as follows: for a given SCC, (a) if for every request type that is present in the SCC the corresponding grant type is also present in the SCC, then the SCC is identified as "good", (b) else vertices of each request type that has no corresponding grant type in the SCC are removed, and the algorithm recursively proceeds on the remaining graph. Finally, reachability to good SCCs is computed. The current best-known symbolic algorithm for SCC computation requires O(n) symbolic steps, for graphs with n vertices [25], and moreover, the algorithm is optimal [15]. For MDPs, the SCC computation has to be replaced by MEC (maximal end-component) computation, and the current best-known symbolic algorithm for MEC computation requires O(n<sup>2</sup>) symbolic steps. While there have been several explicit algorithms for graphs with Streett objectives [12, 26], MEC computation [8–10], and MDPs with Streett objectives [7], as well as symbolic algorithms for MDPs with B¨uchi objectives [11], the current bestknown bounds for symbolic algorithms with Streett objectives are obtained from the basic algorithms, which are <sup>O</sup>(<sup>n</sup> · min(n, k)) for graphs and <sup>O</sup>(n<sup>2</sup> · min(n, k)) for MDPs, where k is the number of types of request-grant pairs.

*Our Contributions.* In this work our main contributions are as follows:


*Technical Contributions.* The two key technical contributions of our work are as follows:

– *Symbolic Lock Step Search:* We search for newly emerged SCCs by a local graph exploration around vertices that lost adjacent edges. In order to find small new SCCs first, all searches are conducted "in parallel", i.e., in lockstep, and the searches stop as soon as the first one finishes successfully. This approach has successfully been used to improve explicit algorithms [7,9,14,26]. Our contribution is a non-trivial symbolic variant (Sect. 3) which lies at the core of the theoretical improvements.


**Table 1.** Symbolic algorithms for Streett objectives and MEC decomposition.

– *Symbolic Interleaved MEC Computation:* For MDPs the identification of vertices that have to be removed can be interleaved with the computation of MECs such that in each iteration the computation of SCCs instead of MECs is sufficient to make progress [7]. We present a symbolic variant of this interleaved computation. This interleaved MEC computation is the basis for applying the lock-step search to MDPs.

# **2 Definitions**

### **2.1 Basic Problem Definitions**

*Markov Decision Processes* (*MDPs*) *and Graphs.* An MDP P = ((V,E),(V1, VR), δ) consists of a finite directed graph G = (V,E) with a set of n vertices V and a set of m edges E, a partition of the vertices into *player 1 vertices* V<sup>1</sup> and *random vertices* VR, and a probabilistic transition function δ. We call an edge (u, v) with u ∈ V<sup>1</sup> a *player 1 edge* and an edge (v, w) with v ∈ V<sup>R</sup> a *random edge*. For v ∈ V we define *In*(v) = {w ∈ V | (w, v) ∈ E} and *Out*(v) = {w ∈ V | (v, w) ∈ E}. The probabilistic transition function is a function from V<sup>R</sup> to D(V ), where D(V ) is the set of probability distributions over V and a random edge (v, w) ∈ E if and only if δ(v)[w] > 0. Graphs are a special case of MDPs with V<sup>R</sup> = ∅.

*Plays and Strategies.* A *play* or infinite path in P is an infinite sequence ω = v0, v1, v2,... such that (vi, vi+1) <sup>∈</sup> <sup>E</sup> for all <sup>i</sup> <sup>∈</sup> <sup>N</sup>; we denote by <sup>Ω</sup> the set of all plays. A player 1 *strategy* σ : V <sup>∗</sup> · V<sup>1</sup> → V is a function that assigns to every finite prefix ω ∈ V <sup>∗</sup> · V<sup>1</sup> of a play that ends in a player 1 vertex v a successor vertex σ(ω) ∈ V such that (v, σ(ω)) ∈ E; we denote by Σ the set of all player 1 strategies. A strategy is *memoryless* if we have σ(ω) = σ(ω ) for any ω, ω ∈ V <sup>∗</sup> · V<sup>1</sup> that end in the same vertex v ∈ V1.

*Objectives.* An *objective* φ is a subset of Ω said to be winning for player 1. We say that a play ω ∈ Ω *satisfies the objective* if ω ∈ φ. For a vertex set T ⊆ V the *reachability objective* is the set of infinite paths that contain a vertex of T, i.e., Reach(T) = {v0, v1, v2,... ∈ Ω | ∃j ≥ 0 : v<sup>j</sup> ∈ T}. Let Inf(ω) for ω ∈ Ω denote the set of vertices that occur infinitely often in ω. Given a set TP of k pairs (Li, Ui) of vertex sets Li, U<sup>i</sup> ⊆ V with 1 ≤ i ≤ k, the *Streett objective* is the set of infinite paths for which it holds *for each* 1 ≤ i ≤ k that whenever a vertex of L<sup>i</sup> occurs infinitely often, then a vertex of U<sup>i</sup> occurs infinitely often, i.e., Streett(TP) = {ω ∈ Ω | L<sup>i</sup> ∩ Inf(ω) = ∅ or U<sup>i</sup> ∩ Inf(ω) = ∅ for all 1 ≤ i ≤ k}.

*Almost-Sure Winning Sets.* For any measurable set of plays A ⊆ Ω we denote by Pr<sup>σ</sup> <sup>v</sup> (A) the probability that a play starting at v ∈ V belongs to A when player 1 plays strategy σ. A strategy σ is *almost-sure* (a.s.) *winning* from a vertex <sup>v</sup> <sup>∈</sup> <sup>V</sup> for an objective <sup>φ</sup> if Pr<sup>σ</sup> <sup>v</sup> (φ) = 1. The *almost-sure winning set* 1*as* (P, φ) of player 1 is the set of vertices for which player 1 has an almostsure winning strategy. In graphs the existence of an almost-sure winning strategy corresponds to the existence of a play in the objective, and the set of vertices for which player 1 has an (almost-sure) winning strategy is called the *winning set* 1(P, φ) of player 1.

*Symbolic Encoding of MDPs.* Symbolic algorithms operate on sets of vertices, which are usually described by Binary Decision Diagrams (bdds) [1,30]. In particular Ordered Binary Decision Diagrams [6] (Obdds) provide a canonical symbolic representation of Boolean functions. For the computation of almost-sure winning sets of MDPs it is sufficient to encode MDPs with Obdds and one additional bit that denotes whether a vertex is in V<sup>1</sup> or VR.

*Symbolic Steps.* One symbolic step corresponds to one primitive operation as supported by standard symbolic packages like CuDD [35]. In this paper we only allow the same basic *set-based symbolic operations* as in [5,11,24,33], namely set operations and the following one-step symbolic operations for a set of vertices Z: (a) the one-step predecessor operator Pre(Z) = {<sup>v</sup> <sup>∈</sup> <sup>V</sup> <sup>|</sup> *Out*(v)∩<sup>Z</sup> <sup>=</sup> ∅}; (b) the one-step successor operator Post(Z) = {<sup>v</sup> <sup>∈</sup> <sup>V</sup> <sup>|</sup> *In*(v) <sup>∩</sup> <sup>Z</sup> <sup>=</sup> ∅}; and (c) the one-step *controllable* predecessor operator CPre<sup>R</sup>(Z) = {<sup>v</sup> <sup>∈</sup> <sup>V</sup><sup>1</sup> <sup>|</sup> *Out*(v) <sup>⊆</sup> <sup>Z</sup>} ∪ {<sup>v</sup> <sup>∈</sup> <sup>V</sup><sup>R</sup> <sup>|</sup> *Out*(v) <sup>∩</sup> <sup>Z</sup> <sup>=</sup> ∅} ; i.e., the CPre<sup>R</sup> operator computes all vertices such that the successor belongs to Z with positive probability. This operator can be defined using the Pre operator and basic set operations as follows: CPre<sup>R</sup>(Z) = Pre(Z)\(V<sup>1</sup> <sup>∩</sup> Pre(<sup>V</sup> \Z)) . We additionally allow cardinality computation and picking an arbitrary vertex from a set as in [11].

*Symbolic Model.* Informally, a symbolic algorithm does not operate on explicit representation of the transition function of a graph, but instead accesses it through Pre and Post operations. For explicit algorithms, a Pre/Post operation on a set of vertices (resp., a single vertex) requires O(m) (resp., the order of indegree/outdegree of the vertex) time. In contrast, for symbolic algorithms Pre/Post operations are considered unit-cost. Thus an interesting algorithmic question is whether better algorithmic bounds can be obtained considering Pre/Post as unit operations. Moreover, the basic set operations are computationally less expensive (as they encode the relationship between the state variables) compared to the Pre/Post symbolic operations (as they encode the transitions and thus the relationship between the present and the next-state variables). In all presented algorithms, the number of set operations is asymptotically at most the number of Pre/Post operations. Hence in the sequel we focus on the number of Pre/Post operations of algorithms.

*Algorithmic Problem.* Given an MDP P (resp. a graph G) and a set of Streett pairs TP, the problem we consider asks for a symbolic algorithm to

compute the almost-sure winning set 1*as* (P, Streett(TP)) (resp. the winning set 1(G, Streett(TP))), which is also called the *qualitative analysis* of MDPs (resp. graphs).

# **2.2 Basic Concepts Related to Algorithmic Solution**

*Reachability.* For a graph G = (V,E) and a set of vertices S ⊆ V the set GraphReach(G, S) is the set of vertices of <sup>V</sup> that *can reach* a vertex of <sup>S</sup> within <sup>G</sup>, and it can be identified with at most <sup>|</sup>GraphReach(G, S)\S<sup>|</sup> + 1 many Pre operations.

*Strongly Connected Components.* For a set of vertices S ⊆ V we denote by G[S]=(S, E ∩(S×S)) the subgraph of the graph G induced by the vertices of S. An induced subgraph G[S] is strongly connected if there exists a path in G[S] between every pair of vertices of S. A *strongly connected component* (*SCC* ) of G is a set of vertices C ⊆ V such that the induced subgraph G[C] is strongly connected and C is a maximal set in V with this property. We call an SCC *trivial* if it only contains a single vertex and no edges; and *non-trivial* otherwise. The SCCs of G partition its vertices and can be found in O(n) symbolic steps [25]. A bottom SCC C in a directed graph G is an SCC with no edges from vertices of C to vertices of V \C, i.e., an SCC without *outgoing* edges. Analogously, a top SCC C is an SCC with no *incoming* edges from V \C. For more intuition for bottom and top SCCs, consider the graph in which each SCC is contracted into a single vertex (ignoring edges within an SCC). In the resulting directed acyclic graph the sinks represent the bottom SCCs and the sources represent the top SCCs. Note that every graph has at least one bottom and at least one top SCC. If the graph is not strongly connected, then there exist at least one top and at least one bottom SCC that are disjoint and thus one of them contains at most half of the vertices of G.

*Random Attractors.* In an MDP P the *random attractor Attr*R(P,W) of a set of vertices W is defined as *Attr*R(P,W) = - <sup>j</sup>≥<sup>0</sup> <sup>Z</sup><sup>j</sup> where <sup>Z</sup><sup>0</sup> <sup>=</sup> <sup>W</sup> and <sup>Z</sup>j+1 <sup>=</sup> <sup>Z</sup><sup>j</sup> <sup>∪</sup> CPre<sup>R</sup>(Z<sup>j</sup> ) for all j > 0. The attractor can be computed with at most <sup>|</sup>*Attr*R(P,W)\W<sup>|</sup> + 1 many CPre<sup>R</sup> operations.

*Maximal End-Components.* Let X be a vertex set without outgoing random edges, i.e., with *Out*(v) ⊆ X for all v ∈ X ∩ VR. A sub-MDP of an MDP P induced by a vertex set X ⊆ V without outgoing random edges is defined as P[X] = ((X, E ∩(X ×X),(V1∩X, V<sup>R</sup> ∩X), δ). Note that the requirement that X has no outgoing random edges is necessary in order to use the same probabilistic transition function δ. An *end-component* (EC) of an MDP P is a set of vertices X ⊆ V such that (a) X has no outgoing random edges, i.e., P[X] is a valid sub-MDP, (b) the induced sub-MDP P[X] is strongly connected, and (c) P[X] contains at least one edge. Intuitively, an end-component is a set of vertices for which player 1 can ensure that the play stays within the set and almost-surely reaches all the vertices in the set (infinitely often). An end-component is a *maximal end-component* (MEC) if it is maximal under set inclusion. An end-component is *trivial* if it consists of a single vertex (with a self-loop), otherwise it is *nontrivial*. The *MEC decomposition* of an MDP consists of all MECs of the MDP.

*Good End-Components.* All algorithms for MDPs with Streett objectives are based on finding good end-components, defined below. Given the union of all good end-components, the almost-sure winning set is obtained by computing the almost-sure winning set for the reachability objective with the union of all good end-components as the target set. The correctness of this approach is shown in [7,31] (see also [3, Chap. 10.6.3]). For Streett objectives a good end-component is defined as follows. In the special case of graphs they are called good components.

**Definition 1 (Good end-component).** *Given an MDP* P *and a set* TP = {(L<sup>j</sup> , U<sup>j</sup> ) | 1 ≤ j ≤ k} *of target pairs, a* good end-component *is an endcomponent* X *of* P *such that for each* 1 ≤ j ≤ k *either* L<sup>j</sup> ∩X = ∅ *or* U<sup>j</sup> ∩X = ∅*. A maximal good end-component is a good end-component that is maximal with respect to set inclusion.*

**Lemma 1 (Correctness of Computing Good End-Components** [31, **Corollary 2.6.5, Proposition 2.6.9]).** *For an MDP* P *and a set* TP *of target pairs, let* X *be the set of all maximal good end-components. Then* 1*as* P, *Reach*( - <sup>X</sup>∈X <sup>X</sup>) *is equal to* 1*as* (P, *Streett*(TP))*.*

*Iterative Vertex Removal.* All the algorithms for Streett objectives maintain vertex sets that are candidates for good end-components. For such a vertex set S we (a) refine the maintained sets according to the SCC decomposition of P[S] and (b) for a set of vertices W for which we know that it cannot be contained in a good end-component, we remove its random attractor from S. The following lemma shows the correctness of these operations.

**Lemma 2 (Correctness of Vertex Removal** [31, **Lemma 2.6.10]).** *Given an MDP* P = ((V,E),(V1, VR), δ)*, let* X *be an end-component with* X ⊆ S *for some* S ⊆ V *. Then*

*(a)* X ⊆ C *for one SCC* C *of* P[S] *and*

*(b)* X ⊆ S\*Attr*R(P , W) *for each* W ⊆ V \X *and each sub-MDP* P *containing* X*.*

Let X be a good end-component. Then X is an end-component and for each index j, X ∩U<sup>j</sup> = ∅ implies X ∩L<sup>j</sup> = ∅. Hence we obtain the following corollary.

**Corollary 1 (**[31, **Corollary 4.2.2]).** *Given an MDP* P*, let* X *be a* good *endcomponent with* X ⊆ S *for some* S ⊆ V *. For each* i *with* S ∩U<sup>i</sup> = ∅ *it holds that* X ⊆ S\*Attr*R(P[S], L<sup>i</sup> ∩ S)*.*

For an index j with S ∩ U<sup>j</sup> = ∅ we call the vertices of S ∩ L<sup>j</sup> *bad vertices*. The set of all bad vertices Bad(S) = - <sup>1</sup>≤i≤<sup>k</sup>{<sup>v</sup> <sup>∈</sup> <sup>L</sup><sup>i</sup> <sup>∩</sup> <sup>S</sup> <sup>|</sup> <sup>U</sup><sup>i</sup> <sup>∩</sup> <sup>S</sup> <sup>=</sup> ∅} can be computed with 2k set operations.

# **3 Symbolic Divide-and-Conquer with Lock-Step Search**

In this section we present a symbolic version of the lock-step search for strongly connected subgraphs [26]. This symbolic version is used in all subsequent results, i.e., the sub-quadratic symbolic algorithms for graphs and MDPs with Streett objectives, and for MEC decomposition.

*Divide-and-Conquer.* The common property of the algorithmic problems we consider in this work is that the goal is to identify subgraphs of the input graph G = (V,E) that are strongly connected and satisfy some additional properties. The difference between the problems lies in the required additional properties. We describe and analyze the Procedure Lock-Step-Search that we use in all our improved algorithms to efficiently implement a divide-and-conquer approach based on the requirement of strong connectivity, that is, we divide a subgraph G[S], induced by a set of vertices S, into two parts that are not strongly connected within G[S] or detect that G[S] is strongly connected.

*Start Vertices of Searches.* The input to Procedure Lock-Step-Search is a set of vertices S ⊆ V and two subsets of S denoted by H<sup>S</sup> and TS. In the algorithms that call the procedure as a subroutine, vertices contained in H<sup>S</sup> have lost incoming edges (i.e., they were a "head" of a lost edge) and vertices contained in T<sup>S</sup> have lost outgoing edges (i.e., they were a "tail" of a lost edge) since the last time a superset of S was identified as being strongly connected. For each vertex h of H<sup>S</sup> the procedure conducts a backward search (i.e., a sequence of Pre operations) within <sup>G</sup>[S] to find the vertices of <sup>S</sup> that can reach <sup>h</sup>; and analogously a forward search (i.e., a sequence of Post operations) from each vertex t of T<sup>S</sup> is conducted.

*Intuition for the Choice of Start Vertices.* If the subgraph G[S] is not strongly connected, then it contains at least one top SCC and at least one bottom SCC that are disjoint. Further, if for a superset S ⊃ S the subgraph G[S ] was strongly connected, then each top SCC of G[S] contains a vertex that had an additional incoming edge in G[S ] compared to G[S], and analogously each bottom SCC of G[S] contains a vertex that had an additional outgoing edge. Thus by keeping track of the vertices that lost incoming or outgoing edges, the following invariant will be maintained by all our improved algorithms.

**Invariant 1 (Start Vertices Sufficient).** *We have* HS, T<sup>S</sup> ⊆ S*. Either* (*a*) H<sup>S</sup> ∪ T<sup>S</sup> = ∅ *and* G[S] *is strongly connected or* (*b*) *at least one vertex of each top SCC of* G[S] *is contained in* H<sup>S</sup> *and at least one vertex of each bottom SCC of* G[S] *is contained in* TS*.*

*Lock-Step Search.* The searches from the vertices of H<sup>S</sup> ∪ T<sup>S</sup> are performed in *lock-step*, that is, (a) one step is performed in each of the searches before the next step of any search is done and (b) all searches stop as soon as the first of the searches finishes. This is implemented in Procedure Lock-Step-Search as follows. A step in the search from a vertex t ∈ T<sup>S</sup> (and analogously for h ∈ HS) corresponds to the execution of the iteration of the for-each loop for t ∈ TS. In an iteration of a for-each loop we might discover that we do not need to consider this search further (see the paragraph on ensuring strong connectivity below) and update the set T<sup>S</sup> (via T <sup>S</sup>) for future iterations accordingly. Otherwise the set C<sup>t</sup> is either strictly increasing in this step of the search or the search for t

```
Procedure. Lock-Step-Search(G, S, HS, TS)
```


terminates and we return the set of vertices in G[S] that are reachable from t. So the two for-each loops over the vertices of T<sup>S</sup> and H<sup>S</sup> that are executed in an iteration of the while-loop perform one step of each of the searches and the while-loop stops as soon as a search stops, i.e., a return statement is executed and hence this implements properties (a) and (b) of lock-step search. Note that the while-loop terminates, i.e., a return statement is executed eventually because for all t ∈ T<sup>S</sup> (and resp. for all h ∈ HS) the sets C<sup>t</sup> are monotonically increasing over the iterations of the while-loop, we have C<sup>t</sup> ⊆ S, and if some set C<sup>t</sup> does not increase in an iteration, then it is either removed from T<sup>S</sup> and thus not considered further or a return statement is executed. Note that when a search from a vertex t ∈ T<sup>S</sup> stops, it has discovered a maximal set of vertices C that can be reached from t; and analogously for h ∈ HS. Figure 1 shows a small intuitive example of a call to the procedure.

*Comparison to Explicit Algorithm.* In the *explicit* version of the algorithm [7,26] the search from vertex t ∈ T<sup>S</sup> performs a depth-first search that terminates exactly when every *edge* reachable from t is explored. Since any search that starts outside of a bottom SCC but reaches the bottom SCC has to explore more edges than the search started inside of the bottom SCC, the first search from a vertex of T<sup>S</sup> that terminates has exactly explored (one of) the smallest (in the number of edges) bottom SCC(s) of G[S]. Thus on explicit graphs the explicit lock-step search from the vertices of H<sup>S</sup> ∪ T<sup>S</sup> finds (one of) the smallest (in the number of edges) top or bottom SCC(s) of G[S] in time proportional to the number of searches times the number of edges in the identified SCC. In *symbolically* represented graphs it can happen (1) that a search started outside of a bottom (resp. top) SCC terminates earlier than the search started within

**Fig. 1.** An example of symbolic lock-step search showing the first three iterations of the main while-loop. Note that during the second iteration, the search started from t<sup>1</sup> is disregarded since it collides with t2. In the subsequent fourth iteration, the search started from t<sup>2</sup> is returned by the procedure.

the bottom (resp. top) SCC and (2) that a search started in a larger (in the number of vertices) top or bottom SCC terminates before one in a smaller top or bottom SCC. We discuss next how we address these two challenges.

*Ensuring Strong Connectivity.* First, we would like the set returned by Procedure <sup>L</sup>ock-Step-Search to indeed be a top or bottom SCC of <sup>G</sup>[S]. For this we use the following observation for bottom SCCs that can be applied to top SCCs analogously. If a search starting from a vertex of t<sup>1</sup> ∈ T<sup>S</sup> encounters another vertex t<sup>2</sup> ∈ TS, t<sup>1</sup> = t2, there are two possibilities: either (1) both vertices are in the same SSC or (2) t<sup>1</sup> can reach t<sup>2</sup> but not vice versa. In Case (1) the searches from both vertices can explore all vertices in the SCC and thus it is sufficient to only search from one of them. In Case (2) the SCC of t<sup>1</sup> has an outgoing edge and thus cannot be a bottom SCC. Hence in both cases we can remove the vertex t<sup>1</sup> from the set T<sup>S</sup> while still maintaining Invariant 1. By Invariant 1 we further have that each search from a vertex of T<sup>S</sup> that is not in a bottom SCC encounters another vertex of T<sup>S</sup> in its search and therefore is removed from the set <sup>T</sup><sup>S</sup> during Procedure Lock-Step-Search (if no top or bottom SCC is found earlier). This ensures that the returned set is either a top or a bottom SCC.<sup>1</sup>

*Bound on Symbolic Steps.* Second, observe that we can still bound the number of symbolic steps needed for the search that terminates first by the number of *vertices* in the smallest top or bottom SCC of G[S], since this is an upper bound on the symbolic steps needed for the search started in this SCC. Thus provided Invariant 1, we can bound the number of symbolic steps in Procedure <sup>L</sup>ock-Step-Search to identify a vertex set <sup>C</sup> - S such that C and S\C are not strongly connected in G[S] by O((|HS| + |TS|) · min(|C|, |S\C|)). In the algorithms that call Procedure Lock-Step-Search we charge the number of symbolic steps in the procedure to the vertices in the smaller set of C and S\C; this ensures that each vertex is charged at most O(log n) times over the whole algorithm. We obtain the following result (proof in [13, Appendix A]).

<sup>1</sup> To improve the practical performance, we return the updated sets H*<sup>S</sup>* and T*S*. By the above argument this preserves Invariant 1.

**Theorem 1 (Lock-Step Search).** *Provided Invariant 1 holds, Procedure <sup>L</sup>*ock*-S*tep*-S*earch (G*,* <sup>S</sup>*,* <sup>H</sup>S*,* <sup>T</sup>S) *returns a top or bottom SCC* <sup>C</sup> *of* G[S]*. It uses* O((|HS| + |TS|) · min(|C|, |S\C|)) *symbolic steps if* C = S *and* O((|HS| + |TS|) · |C|) *otherwise.*

# **4 Graphs with Streett Objectives**

**Basic Symbolic Algorithm.** Recall that for a given graph (with n vertices) and a Streett objective (with k target pairs) each non-trivial strongly connected subgraph without bad vertices is a good component. The basic symbolic algorithm for graphs with Streett objectives repeatedly removes bad vertices from each SCC and then recomputes the SCCs until all good components are found. The winning set then consists of the vertices that can reach a good component. We refer to this algorithm as StreettGraphBasic. For the pseudocode and more details see [13, Appendix B].

**Proposition 1.** *Algorithm* StreettGraphBasic *correctly computes the winning set in graphs with Streett objectives and requires* O(n · min(n, k)) *symbolic steps.*

**Improved Symbolic Algorithm.** In our improved symbolic algorithm we replace the recomputation of all SCCs with the search for a new top or bottom SCC with Procedure Lock-Step-Search from vertices that have lost adjacent edges whenever there are not too many such vertices. We present the improved symbolic algorithm for graphs with Streett objectives in more detail as it also conveys important intuition for the MDP case. The pseudocode is given in Algorithm StreettGraphImpr.

*Iterative Refinement of Candidate Sets.* The improved algorithm maintains a set goodC of already identified good components that is initially empty and a set X of candidates for good components that is initialized with the SCCs of the input graph G. The difference to the basic algorithm lies in the properties of the vertex sets maintained in X and the way we identify sets that can be separated from each other without destroying a good component. In each iteration one vertex set S is removed from X and, after the removal of bad vertices from the set, either identified as a good component or split into several candidate sets. By Lemma 2 and Corollary 1 the following invariant is maintained throughout the algorithm for the sets in goodC and <sup>X</sup> .

**Invariant 2 (Maintained Sets).** *The sets in* X ∪ goodC *are pairwise disjoint and for every good component* C *of* G *there exists a set* Y ⊇ C *such that either* <sup>Y</sup> ∈ X *or* <sup>Y</sup> <sup>∈</sup> goodC*.*

*Lost Adjacent Edges.* In contrast to the basic algorithm, the subgraph induced by a set S contained in X is not necessarily strongly connected. Instead, we remember vertices of S that have lost adjacent edges since the last time a superset of S was determined to induce a strongly connected subgraph; vertices that lost **Algorithm.** StreettGraphImpr. Improved Alg. for Graphs with Streett Obj.

```
Input : graph G = (V,E) and Streett pairs TP = {(Li, Ui) | 1 ≤ i ≤ k}
  Output : 1		 (G, Streett(TP))
1 X ← allSCCs(G); goodC ← ∅
2 foreach C ∈ X do HC ← ∅; TC ← ∅
3 while X = ∅ do
4 remove some S ∈ X from X
5 B ← -

          1≤i≤k:Ui∩S=∅(Li ∩ S)
6 while B = ∅ do
7 S ← S\B
8 HS ← (HS ∪ Post(B)) ∩ S
9 TS ← (TS ∪ Pre(B)) ∩ S
10 B ← -

            1≤i≤k:Ui∩S=∅(Li ∩ S)
11 if Post(S) ∩ S = ∅ then /* G[S] contains at least one edge */
12 if |HS| + |TS| = 0 then goodC ← goodC ∪ {S}
13 else if |HS| + |TS| ≥ m/ log n then
14 delete HS and TS
15 C ← allSCCs(G[S])
16 if |C| = 1 then goodC ← goodC ∪ {S}
17 else
18 foreach C ∈ C do HC ← ∅; TC ← ∅
19 X ←X ∪C
20 else
21 (C, HS, TS) ← Lock-Step-Search (G, S, HS, TS)
22 if C = S then goodC ← goodC ∪ {S}
23 else /* separate C and S\C */
24 S ← S\C
25 HC ← ∅; TC ← ∅
26 HS ← (HS ∪ Post(C)) ∩ S
27 TS ← (TS ∪ Pre(C)) ∩ S
28 X ←X ∪{S}∪{C}
29 return GraphReach(G, -

                     C∈goodC C)
```
incoming edges are contained in H<sup>S</sup> and vertices that lost outgoing edges are contained in TS. In this way we maintain Invariant 1 throughout the algorithm, which enables us to use Procedure Lock-Step-Search with the running time guarantee provided by Theorem 1.

*Identifying SCCs.* Let S be the vertex set removed from X in a fixed iteration of Algorithm StreettGraphImpr after the removal of bad vertices in the inner while-loop. First note that if S is strongly connected and contains at least one edge, then it is a good component. If the set S was already identified as strongly connected in a previous iteration, i.e., H<sup>S</sup> and T<sup>S</sup> are empty, then S is identified as a good component in line 12. If many vertices of S have lost adjacent edges since the last time a super-set of S was identified as a strongly connected subgraph, then the SCCs of G[S] are determined as in the basic algorithm. To achieve the optimal asymptotic upper bound, we say that many vertices of S have lost adjacent edges when we have |HS| + |TS| ≥ m/ log n, while lower thresholds are used in our experimental results. Otherwise, if not too many vertices of S lost adjacent edges, then we start a symbolic *lock-step search* for top SCCs from the vertices of H<sup>S</sup> and for bottom SCCs from the vertices of T<sup>S</sup> using Procedure Lock-Step-Search. The set returned by the procedure is either a top or a bottom SCC C of G[S] (Theorem 1). Therefore we can from now on consider C and S\C separately, maintaining Invariants 1 and 2.

*Algorithm* <sup>S</sup>treettGraphImpr*.* A succinct description of the pseudocode is as follows: Lines 1–2 initialize the set of candidates for good components with the SCCs of the input graph. In each iteration of the main while-loop one candidate is considered and the following operations are performed: (a) lines 5–10 iteratively remove all bad vertices; if afterwards the candidate is still strongly connected (and contains at least one edge), it is identified as a good component in the next step; otherwise it is partitioned into new candidates in one of the following ways: (b) if many vertices lost adjacent edges, lines 13–17 partition the candidate into its SCCs (this corresponds to an iteration of the basic algorithm); (c) otherwise, lines 20–28 use symbolic lock-step search to partition the candidate into one of its SCCs and the remaining vertices. The while-loop terminates when no candidates are left. Finally, vertices that can reach some good component are returned. We have the following result (proof in [13, Appendix B]).

**Theorem 2 (Improved Algorithm for Graphs).** *Algorithm S*treett*-<sup>G</sup>*raph*<sup>I</sup>* mpr *correctly computes the winning set in graphs with Streett objectives and requires* O(n · <sup>√</sup><sup>m</sup> log <sup>n</sup>) *symbolic steps.*

# **5 Symbolic MEC Decomposition**

In this section we present a succinct description of the basic symbolic algorithm for MEC decomposition and then present the main ideas for the improved algorithm.

*Basic symbolic algorithm for MEC decomposition.* The basic symbolic algorithm for MEC decomposition maintains a set of identified MECs and a set of candidates for MECs, initialized with the SCCs of the MDP. Whenever a candidate is considered, either (a) it is identified as a MEC or (b) it contains vertices with outgoing random edges, which are then removed together with their random attractor from the candidate, and the SCCs of the remaining sub-MDP are added to the set of candidates. We refer to the algorithm as MECBasic.

**Proposition 2.** *Algorithm* MECBasic *correctly computes the MEC decomposition of MDPs and requires* O(n<sup>2</sup>) *symbolic steps.*

*Improved Symbolic Algorithm for MEC Decomposition.* The improved symbolic algorithm for MEC decomposition uses the ideas of symbolic lock-step search presented in Sect. 3. Informally, when considering a candidate that lost a few edges from the remaining graph, we use the symbolic lock-step search to identify some bottom SCC. We refer to the algorithm as MECImpr. Since all the important conceptual ideas regarding the symbolic lock-step search are described in Sect. 3, we relegate the technical details to [13, Appendix C]. We summarize the main result (proof in [13, Appendix C]).

**Theorem 3 (Improved Algorithm for MEC).** *Algorithm* MECImpr *correctly computes the MEC decomposition of MDPs and requires* O(n · √m) *symbolic steps.*

# **6 MDPs with Streett Objectives**

**Basic Symbolic Algorithm.** We refer to the basic symbolic algorithm for MDPs with Streett objectives as StreettMDPbasic, which is similar to the algorithm for graphs, with SCC computation replaced by MEC computation. The pseudocode of Algorithm StreettMDPbasic together with its detailed description is presented in [13, Appendix D].

**Proposition 3.** *Algorithm* StreettMDPbasic *correctly computes the almostsure winning set in MDPs with Streett objectives and requires* <sup>O</sup>(n<sup>2</sup> · min(n, k)) *symbolic steps.*

*Remark.* The above bound uses the basic symbolic MEC decomposition algorithm. Using our improved symbolic MEC decomposition algorithm, the above bound could be improved to O(n · <sup>√</sup><sup>m</sup> · min(n, k)).

**Improved Symbolic Algorithm.** We refer to the improved symbolic algorithm for MDPs with Streett objectives as StreettMDPimpr. First we present the main ideas for the improved symbolic algorithm. Then we explain the key differences compared to the improved symbolic algorithm for graphs. A thorough description with the technical details and proofs is presented in [13, Appendix D].

	- *Intuition of interleaved computation.* Consider a candidate for a good endcomponent S after a random attractor to some bad vertices is removed from it. After the removal of the random attractor, the set S does not have random vertices with outgoing edges. Consider that further Bad(S) = <sup>∅</sup> holds. If S is strongly connected and contains an edge, then it is a good end-component. If S is not strongly connected, then P[S] contains at least two SCCs and some of them might have random vertices with outgoing edges. Since end-components are strongly connected and do not have

random vertices with outgoing edges, we have that (1) every good endcomponent is completely contained in one of the SCCs of P[S] and (2) the random vertices of an SCC with outgoing edges and their random attractor do not intersect with any good end-component (see Lemma 2).

• *Modification from basic to improved algorithm.* We use these observations to modify the basic algorithm as follows: First, for the sets that are candidates for good end-components, we do not maintain the property that they are end-components, but only that they do not have random vertices with outgoing edges (it still holds that every maximal good endcomponent is either already identified or contained in one of the candidate sets). Second, for a candidate set S, we repeat the removal of bad vertices until Bad(S) = <sup>∅</sup> holds before we continue with the next step of the algorithm. This allows us to make progress after the removal of bad vertices by computing all SCCs (instead of MECs) of the remaining sub-MDP. If there is only one SCC, then this is a good end-component (if it contains at least one edge). Otherwise (a) we remove from each SCC the set of random vertices with outgoing edges and their random attractor and (b) add the remaining vertices of each SCC as a new candidate set.

– Second, as for the improved symbolic algorithm for graphs, we use the symbolic lock-step search to quickly identify a top or bottom SCC every time a candidate has lost a small number of edges since the last time its superset was identified as being strongly connected. The symbolic lock-step search is described in detail in Sect. 3.

Using interleaved MEC computation and lock-step search leads to a similar algorithmic structure for Algorithm StreettMDPimpr as for our improved symbolic algorithm for graphs (Algorithm StreettGraphImpr). The key differences are as follows: First, the set of candidates for good end-components is initialized with the MECs of the input graph instead of the SCCs. Second, whenever bad vertices are removed from a candidate, also their random attractor is removed. Further, whenever a candidate is partitioned into its SCCs, for each SCC, the random attractor of the vertices with outgoing random edges is removed. Finally, whenever a candidate S is separated into C and S\C via symbolic lock-step search, the random attractor of the vertices with outgoing random edges is removed from C, and the random attractor of C is removed from S.

**Theorem 4 (Improved Algorithm for MDPs).** *Algorithm S*treett *MDP*impr *correctly computes the almost-sure winning set in MDPs with Streett objectives and requires* O(n · <sup>√</sup><sup>m</sup> log <sup>n</sup>) *symbolic steps.*

# **7 Experiments**

We present a basic prototype implementation of our algorithm and compare against the basic symbolic algorithm for graphs and MDPs with Streett objectives.

*Models.* We consider the academic benchmarks from the VLTS benchmark suite [21], which gives representative examples of systems with nondeterminism, and has been used in previous experimental evaluation (such as [4,11]).

*Specifications.* We consider random LTL formulae and use the tool Rabinizer [28] to obtain deterministic Rabin automata. Then the negations of the formulae give us Streett automata, which we consider as the specifications.

*Graphs.* For the models of the academic benchmarks, we first compute SCCs, as all algorithms for Streett objectives compute SCCs as a preprocessing step. For SCCs of the model benchmarks we consider products with the specification Streett automata, to obtain graphs with Streett objectives, which are the benchmark examples for our experimental evaluation. The number of transitions in the benchmarks ranges from 300K to 5Million.

*MDPs.* For MDPs, we consider the graphs obtained as above and consider a fraction of the vertices of the graph as random vertices, which is chosen uniformly at random. We consider 10%, 20%, and 50% of the vertices as random vertices for different experimental evaluation.

**Fig. 2.** Results for graphs with Streett objectives.

*Experimental Evaluation.* In the experimental evaluation we compare the number of symbolic steps (i.e., the number of Pre/Post operations<sup>2</sup>) executed by the algorithms, the comparison of running time yields similar results and is provided in [13, Appendix E]. As the initial preprocessing step is the same for all the algorithms (computing all SCCs for graphs and all MECs for MDPs), the comparison presents the number of symbolic steps executed after the preprocessing. The experimental results for graphs are shown in Fig. 2 and the experimental results for MDPs are shown in Fig. 3 (in each figure the two lines represent equality and an order-of-magnitude improvement, respectively).

*Discussion.* Note that the lock-step search is the key reason for theoretical improvement, however, the improvement relies on a large number of Streett pairs.

<sup>2</sup> Recall that the basic set operations are cheaper to compute, and asymptotically at most the number of Pre/Post operations in all the presented algorithms.

**Fig. 3.** Results for MDPs with Streett objectives.

In the experimental evaluation, the LTL formulae generate Streett automata with small number of pairs, which after the product with the model accounts for an even smaller fraction of pairs as compared to the size of the state space. This has two effects:


In contrast to graphs, in MDPs even with small number of pairs as compared to the state-space, the interleaved MEC computation has a notable effect on practical performance, and we observe performance improvement even in large MDPs.

# **8 Conclusion**

In this work we consider symbolic algorithms for graphs and MDPs with Streett objectives, as well as for MEC decomposition. Our algorithmic bounds match for both graphs and MDPs. In contrast, while SCCs can be computed in linearly many symbolic steps no such algorithm is known for MEC decomposition. An interesting direction of future work would be to explore further improved symbolic algorithms for MEC decomposition. Moreover, further improved symbolic algorithms for graphs and MDPs with Streett objectives is also an interesting direction of future work.

**Acknowledgements.** K. C. and M. H. are partially supported by the Vienna Science and Technology Fund (WWTF) grant ICT15-003. K. C. is partially supported by the Austrian Science Fund (FWF): S11407-N23 (RiSE/SHiNE), and an ERC Start Grant (279307: Graph Games). V. T. is partially supported by the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Grant Agreement No. 665385. V. L. is partially supported by the Austrian Science Fund (FWF): S11408-N23 (RiSE/SHiNE), the ISF grant #1278/16, and an ERC Consolidator Grant (project MPM). For M. H. and V. L. the research leading to these results has received funding from the European Research Council under the European Union's Seventh Framework Programme (FP/2007-2013)/ERC Grant Agreement no. 340506.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Attracting Tangles to Solve Parity Games**

Tom van Dijk(B)

Formal Models and Verification, Johannes Kepler University, Linz, Austria tom.vandijk@jku.at

**Abstract.** Parity games have important practical applications in formal verification and synthesis, especially to solve the model-checking problem of the modal mu-calculus. They are also interesting from the theory perspective, because they are widely believed to admit a polynomial solution, but so far no such algorithm is known.

We propose a new algorithm to solve parity games based on learning tangles, which are strongly connected subgraphs for which one player has a strategy to win all cycles in the subgraph. We argue that tangles play a fundamental role in the prominent parity game solving algorithms. We show that tangle learning is competitive in practice and the fastest solver for large random games.

### **1 Introduction**

Parity games are turn-based games played on a finite graph. Two players *Odd* and *Even* play an infinite game by moving a token along the edges of the graph. Each vertex is labeled with a natural number *priority* and the winner of the game is determined by the parity of the highest priority that is encountered infinitely often. Player Odd wins if this parity is odd; otherwise, player Even wins.

Parity games are interesting both for their practical applications and for complexity theoretic reasons. Their study has been motivated by their relation to many problems in formal verification and synthesis that can be reduced to the problem of solving parity games, as parity games capture the expressive power of nested least and greatest fixpoint operators [11]. In particular, deciding the winner of a parity game is polynomial-time equivalent to checking non-emptiness of non-deterministic parity tree automata [21], and to the explicit model-checking problem of the modal μ-calculus [9,15,20].

Parity games are interesting in complexity theory, as the problem of determining the winner of a parity game is known to lie in UP ∩ co-UP [16], which is contained in NP ∩ co-NP [9]. This problem is therefore unlikely to be NPcomplete and it is widely believed that a polynomial solution exists. Despite much effort, such an algorithm has not been found yet.

T. van Dijk—The author is supported by the FWF, NFN Grant S11408-N23 (RiSE).

c The Author(s) 2018 H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 198–215, 2018. https://doi.org/10.1007/978-3-319-96142-2\_14

The main contribution of this paper is based on the notion of a *tangle*. A tangle is a strongly connected subgraph of a parity game for which one of the players has a strategy to win all cycles in the subgraph. We propose this notion and its relation to dominions and cycles in a parity game. Tangles are related to snares [10] and quasi-dominions [3], with the critical difference that tangles are strongly connected, whereas snares and quasi-dominions may be unconnected as well as contain vertices that are not in any cycles. We argue that tangles play a fundamental role in various parity game algorithms, in particular in priority promotion [3,5], Zielonka's recursive algorithm [25], strategy improvement [10,11,24], small progress measures [17], and in the recently proposed quasi-polynomial time progress measures [6,12].

The core insight of this paper is that tangles can be used to attract sets of vertices at once, since the losing player is forced to escape a tangle. This leads to a novel algorithm to solve parity games called *tangle learning*, which is based on searching for tangles along a top-down α-maximal decomposition of the parity game. New tangles are then attracted in the next decomposition. This naturally leads to learning nested tangles and, eventually, finding dominions. We prove that tangle learning solves parity games and present several extensions to the core algorithm, including *alternating* tangle learning, where the two players take turns maximally searching for tangles in their regions, and *on-the-fly* tangle learning, where newly learned tangles immediately refine the decomposition.

We relate the complexity of tangle learning to the number of learned tangles before finding a dominion, which is related to how often the solver is distracted by paths to higher winning priorities that are not suitable strategies.

We evaluate tangle learning in a comparison based on the parity game solver Oink [7], using the benchmarks of Keiren [19] as well as random parity games of various sizes. We compare tangle learning to priority promotion [3,5] and to Zielonka's recursive algorithm [25] as implemented in Oink.

# **2 Preliminaries**

Parity games are two-player turn-based infinite-duration games over a finite directed graph G = (V,E), where every vertex belongs to exactly one of two players called player *Even* and player *Odd*, and where every vertex is assigned a natural number called the *priority*. Starting from some initial vertex, a play of both players is an infinite path in G where the owner of each vertex determines the next move. The winner of such an infinite play is determined by the parity of the highest priority that occurs infinitely often along the play.

More formally, a parity game is a tuple (<sup>V</sup> , V ,E, pr) where <sup>V</sup> <sup>=</sup> <sup>V</sup> <sup>∪</sup> <sup>V</sup> is a set of vertices partitioned into the sets V controlled by player *Even* and V controlled by player *Odd*, and <sup>E</sup> <sup>⊆</sup> <sup>V</sup> <sup>×</sup><sup>V</sup> is a left-total binary relation describing all moves, i.e., every vertex has at least one successor. We also write E(u) for all successors of <sup>u</sup> and <sup>u</sup> <sup>→</sup> <sup>v</sup> for <sup>v</sup> <sup>∈</sup> <sup>E</sup>(u). The function pr: <sup>V</sup> → {0, <sup>1</sup>,...,d} assigns to each vertex a *priority*, where d is the highest priority in the game.

We write pr(v) for the priority of a vertex <sup>v</sup> and pr(<sup>V</sup> ) for the highest priority of vertices <sup>V</sup> and pr(-) for the highest priority in the game -. Furthermore, we write pr−<sup>1</sup>(i) for all vertices with the priority <sup>i</sup>. A *path* <sup>π</sup> <sup>=</sup> <sup>v</sup>0v<sup>1</sup> ... is a sequence of vertices consistent with <sup>E</sup>, i.e., <sup>v</sup><sup>i</sup> <sup>→</sup> <sup>v</sup>i+1 for all successive vertices. A *play* is an infinite path. We denote with inf(π) the vertices in π that occur infinitely many times in <sup>π</sup>. Player Even wins a play <sup>π</sup> if pr(inf(π)) is even; player Odd wins if pr(inf(π)) is odd. We write Plays(v) to denote all plays starting at vertex <sup>v</sup>.

<sup>A</sup> *strategy* <sup>σ</sup> : <sup>V</sup> <sup>→</sup> <sup>V</sup> is a partial function that assigns to each vertex in its domain a single successor in <sup>E</sup>, i.e., <sup>σ</sup> <sup>⊆</sup> <sup>E</sup>. We refer to a strategy of player α to restrict the domain of σ to Vα. In the remainder, all strategies σ are of a player α. We write Plays(v, σ) for the set of plays from v consistent with σ, and Plays(V,σ) for {<sup>π</sup> <sup>∈</sup> Plays(v, σ) <sup>|</sup> <sup>v</sup> <sup>∈</sup> <sup>V</sup> }.

A fundamental result for parity games is that they are memoryless determined [8], i.e., each vertex is either winning for player Even or for player Odd, and both players have a strategy for their winning vertices. Player α wins vertex v if they have a strategy σ such that all plays in Plays(v, σ) are winning for player α.

Several algorithms for solving parity games employ *attractor computation*. Given a set of vertices A, the attractor of A for a player α represents those vertices from which player α can force a play to visit A. We write *Attr* - <sup>α</sup>(A) to attract vertices in to A as player α, i.e.,

μZ . A ∪ {<sup>v</sup> <sup>∈</sup> <sup>V</sup><sup>α</sup> <sup>|</sup> <sup>E</sup>(v) <sup>∩</sup> <sup>Z</sup> <sup>=</sup> ∅} ∪ {<sup>v</sup> <sup>∈</sup> <sup>V</sup><sup>α</sup> <sup>|</sup> <sup>E</sup>(v) <sup>⊆</sup> <sup>Z</sup>}

Informally, we compute the α-attractor of A with a backward search from A, initially setting Z := A and iteratively adding α-vertices with a successor in Z and α-vertices with no successors outside Z. We also obtain a strategy σ for player α, starting with an empty strategy, by selecting a successor in Z when we attract vertices of player α and when the backward search finds a successor in Z for the α-vertices in A. We call a set of vertices A α-maximal if A = *Attr* - <sup>α</sup>(A).

A *dominion* D is a set of vertices for which player α has a strategy σ such that all plays consistent with σ stay in D and are winning for player α. We also write a p*-dominion* for a dominion where p is the highest priority encountered infinitely often in plays consistent with <sup>σ</sup>, i.e., <sup>p</sup> := max{pr(inf(π)) <sup>|</sup> <sup>π</sup> <sup>∈</sup> Plays(D, σ)}.

# **3 Tangles**

**Definition 1.** *<sup>A</sup>* <sup>p</sup>-tangle *is a nonempty set of vertices* <sup>U</sup> <sup>⊆</sup> <sup>V</sup> *with* <sup>p</sup> <sup>=</sup> *pr*(U)*, for which player* <sup>α</sup> <sup>≡</sup><sup>2</sup> <sup>p</sup> *has a strategy* <sup>σ</sup> : <sup>U</sup><sup>α</sup> <sup>→</sup> <sup>U</sup>*, such that the graph* (U, E )*, with* <sup>E</sup> := <sup>E</sup> <sup>∩</sup> - <sup>σ</sup> <sup>∪</sup>(U<sup>α</sup> <sup>×</sup>U) *, is strongly connected and player* α *wins all cycles in* (U, E )*.*

Informally, a tangle is a set of vertices for which player α has a strategy to win all cycles inside the tangle. Thus, player α loses all plays that stay in U and is therefore forced to escape the tangle. The highest priority by which player α wins a play in (U, E ) is p. We make several basic observations related to tangles.


Observation 1 follows by definition. Observation 2 follows from the fact that dominions won by player α with some strategy σ must contain strongly connected subgraphs where all cycles are won by player α and the highest winning priority is p. For observation 3, consider a p-tangle for which player α has a strategy that avoids priority p while staying in the tangle. Then there is a p -tangle with p < p in which player α also loses.

We can in fact find a hierarchy of tangles in any dominion D with winning strategy σ by computing the set of winning priorities {pr(inf(π)) <sup>|</sup> <sup>π</sup> <sup>∈</sup> Plays(D, σ)}. There is a <sup>p</sup>tangle in D for every p in this set. Tangles are thus a natural substructure of dominions.

See for example Fig. 1. Player Odd wins this dominion with highest priority 5 and strategy {**d** → **e**}. Player Even can also avoid priority 5 and then loses with priority 3. The 5-dominion {**a**, **<sup>b</sup>**, **<sup>c</sup>**, **<sup>d</sup>**, **<sup>e</sup>**} contains the 5-tangle {**b**, **<sup>c</sup>**, **<sup>d</sup>**, **<sup>e</sup>**} and the 3-tangle {**c**, **<sup>e</sup>**}.

**Fig. 1.** A 5-dominion with a 5-tangle and a 3-tangle

### **4 Solving by Learning Tangles**

Since player α must escape tangles won by player α, we can treat a tangle as an abstract vertex controlled by player α that can be attracted by player α, thus attracting all vertices of the tangle. This section proposes the *tangle learning* algorithm, which searches for tangles along a top-down α-maximal decomposition of the game. We extend the attractor to attract all vertices in a tangle when player α is forced to play from the tangle to the attracting set. After extracting new tangles from regions in the decomposition, we iteratively repeat the procedure until a dominion is found. We show that tangle learning solves parity games.

#### **4.1 Attracting Tangles**

Given a tangle t, we denote its vertices simply by t and its witness strategy by σ<sup>T</sup> (t). We write E<sup>T</sup> (t) for the edges from α-vertices in the tangle to the rest of the game: <sup>E</sup><sup>T</sup> (t) := {<sup>v</sup> <sup>|</sup> <sup>u</sup> <sup>→</sup> <sup>v</sup> <sup>∧</sup> <sup>u</sup> <sup>∈</sup> <sup>t</sup> <sup>∩</sup> <sup>V</sup><sup>α</sup> <sup>∧</sup> <sup>v</sup> <sup>∈</sup> <sup>V</sup> \ <sup>t</sup>}. We write <sup>T</sup> for all tangles where pr(t) is odd (won by player Odd) and <sup>T</sup> for all tangles where pr(t) is even. We write *TAttr* -,T <sup>α</sup> (A) to attract vertices in and vertices of tangles in T to A as player α, i.e.,

$$\begin{aligned} \mu Z. A \cup \{ v \in V\_{\alpha} \mid E(v) \cap Z \neq \emptyset \} \cup \{ v \in V\_{\overline{\alpha}} \mid E(v) \subseteq Z \} \\ \cup \{ v \in t \mid t \in T\_{\alpha} \land E\_{T}(t) \neq \emptyset \land E\_{T}(t) \subseteq Z \} \end{aligned}$$

```
1 def solve(-
           ):
2 W ← ∅, W ← ∅, σ ← ∅, σ ← ∅, T ← ∅
3 while -
           = ∅ :
4 T,d ← search(-
                    , T)
5 α ← pr(d) mod 2
6 D, σ ← Attr -

                 α(d)
7 Wα ← Wα ∪ D, σα ← σα ∪ σT (d) ∪ σ
8 -
        ← -
            \ D, T ← T ∩ (-
                         \ D)
9 return W , W , σ , σ
```
**Algorithm 1.** The solve algorithm which computes the winning regions and winning strategies for both players of a given parity game.

This approach is not the same as the subset construction. Indeed, we do not add the tangle itself but rather add all its vertices together. Notice that this attractor does not guarantee arrival in A, as player α can stay in the added tangle, but then player α loses.

To compute a witness strategy σ for player α, as with *Attr* - <sup>α</sup>, we select a successor in Z when attracting single vertices of player α and when we find a successor in Z for the α-vertices in A. When we attract vertices of a tangle, we update σ for each tangle t sequentially, by updating σ with the strategy in σ<sup>T</sup> (t) of those α-vertices in the tangle for which we do not yet have a strategy in σ, i.e., {(u, v) <sup>∈</sup> <sup>σ</sup><sup>T</sup> (t) <sup>|</sup> u /<sup>∈</sup> dom(σ)}. This is important since tangles can overlap.

In the following, we call a set of vertices A α-maximal if A = *TAttr* -,T <sup>α</sup> (A). Given a game and a set of vertices U, we write -<sup>∩</sup><sup>U</sup> for the subgame where <sup>V</sup> := <sup>V</sup> <sup>∩</sup><sup>U</sup> and <sup>E</sup> := <sup>E</sup>∩(<sup>V</sup> <sup>×</sup><sup>V</sup> ). Given a set of tangles T and a set of vertices <sup>U</sup>, we write <sup>T</sup> <sup>∩</sup><sup>U</sup> for all tangles with all vertices in <sup>U</sup>, i.e., {<sup>t</sup> <sup>∈</sup> <sup>T</sup> <sup>|</sup> <sup>t</sup> <sup>⊆</sup> <sup>U</sup>}, and we extend this notation to <sup>T</sup> <sup>∩</sup> for the tangles in the game - , i.e., <sup>T</sup> <sup>∩</sup> <sup>V</sup> .

### **4.2 The** solve **Algorithm**

We solve parity games by iteratively searching and removing a dominion of the game, as in [3,18,22]. See Algorithm 1. The search algorithm (described below) is given a game and a set of tangles and returns an updated set of tangles and a tangle d that is a dominion. Since the dominion d is a tangle, we derive the winner α from the highest priority (line 5) and use standard attractor computation to compute a dominion D (line 6). We add the dominion to the winning region of player α (line 7). We also update the winning strategy of player α using the witness strategy of the tangle d plus the strategy σ obtained during attractor computation. To solve the remainder, we remove all solved vertices from the game and we remove all tangles that contain solved vertices (line 8). When the entire game is solved, we return the winning regions and winning strategies of both players (lines 9). Reusing the (pruned) set of tangles for the next search call is optional; if search is always called with an empty set of tangles, the "forgotten" tangles would be found again.

```
1 def search(-
              , T):
 2 while true :
3 r ← ∅, Y ← ∅
4 while -
               \ r = ∅ :
5 -
            -
             ← -
                 \ r, T-
                       ← T ∩ (-
                              \ r)
6 p ← pr(-
                   -

                   ), α ← pr(-
                            -

                             ) mod 2
7 Z, σ ← TAttr -
                       -
                       ,T -

                      α
                          -

                          {v ∈ -
                                -
                                 | pr(v) = p}

8 A ← extract-tangles(Z, σ)
9 if ∃ t ∈ A: ET (t) = ∅ : return T ∪ Y , t
10 r ← r ∪ -

                  Z → p

                        , Y ← Y ∪ A
11 T ← T ∪ Y
```
**Algorithm 2 .** The search algorithm which, given a game and a set of tangles, returns the updated set of tangles and a tangle that is a dominion.

### **4.3 The** search **Algorithm**

The search algorithm is given in Algorithm 2. The algorithm iteratively computes a top-down decomposition of into sets of vertices called *regions* such that each region is α-maximal for the player α who wins the highest priority in the region. Each next region in the remaining subgame is obtained by taking all vertices with the highest priority p in and computing the tangle attractor set of these vertices for the player that wins that priority, i.e., player <sup>α</sup> <sup>≡</sup><sup>2</sup> <sup>p</sup>. As every next region has a lower priority, each region is associated with a unique priority p. We record the current region of each vertex in an auxiliary partial function r: <sup>V</sup> → {0, <sup>1</sup>,...,d} called the region function. We record the new tangles found during each decomposition in the set Y .

In each iteration of the decomposition, we first obtain the current subgame - (line 5) and the top priority p in - (line 6). We compute the next region by attracting (with tangles) to the vertices of priority p in - (line 7). We use the procedure extract-tangles (described below) to obtain new tangles from the computed region (line 8). For each new tangle, we check if the set of outgoing edges to the full game E<sup>T</sup> (t) is empty. If E<sup>T</sup> (t) is empty, then we have a dominion and we terminate the procedure (line 9). If no dominions are found, then we add the new tangles to <sup>Y</sup> and update r (line 10). After fully decomposing the game into regions, we add all new tangles to T (line 11) and restart the procedure.

#### **4.4 Extracting Tangles from a Region**

To search for tangles in a given region A of player α with strategy σ, we first remove all vertices where player α can play to lower regions (in - ) while player α is constrained to σ, i.e.,

$$\forall \nu Z . A \cap \left( \{ v \in V\_{\overline{\alpha}} \mid E'(v) \subseteq Z \} \cup \{ v \in V\_{\alpha} \mid \sigma(v) \in Z \} \right)$$

This procedure can be implemented efficiently with a backward search, starting from all vertices of priority p that escape to lower regions. Since there can be multiple vertices of priority p, the reduced region may consist of multiple unconnected tangles. We compute all nontrivial bottom SCCs of the reduced region, restricted by the strategy σ. Every such SCC is a unique p-tangle.

# **4.5 Tangle Learning Solves Parity Games**

We now prove properties of the proposed algorithm.

**Lemma 1.** *All regions recorded in r in Algorithm 2 are* α*-maximal in their subgame.*

*Proof.* This is vacuously true at the beginning of the search. Every region Z is α-maximal as Z is computed with *TAttr* (line 7). Therefore the lemma remains true when r is updated at line 10. New tangles are only added to <sup>T</sup> at line 11, after which r is reset to <sup>∅</sup>. 

**Lemma 2.** *All plays consistent with* σ *that stay in a region are won by player* α*.*

*Proof.* Based on how the attractor computes the region, we show that all cycles (consistent with σ) in the region are won by player α. Initially, Z only contains vertices with priority p; therefore, any cycles in Z are won by player α. We consider two cases: (a) When attracting a single vertex v, any new cycles must involve vertices with priority p from the initial set A, since all other α-vertices in Z already have a strategy in Z and all other α-vertices in Z have only successors in Z, otherwise they would not be attracted to Z. Since p is the highest priority in the region, every new cycle is won by player α. (b) When attracting vertices of a tangle, we set the strategy for all attracted vertices of player α to the witness strategy of the tangle. Any new cycles either involve vertices with priority p (as above) or are cycles inside the tangle that are won by player <sup>α</sup>. 

**Lemma 3.** *Player* α *can reach a vertex with the highest priority* p *from every vertex in the region, via a path in the region that is consistent with strategy* σ*.*

*Proof.* The proof is based on how the attractor computes the region. This property is trivially true for the initial set of vertices with priority p. We consider again two cases: (a) When attracting a single vertex v, v is either an α-vertex with a strategy to play to Z, or an α-vertex whose successors are all in Z. Therefore, the property holds for v. (b) Tangles are strongly connected w.r.t. their witness strategy. Therefore player α can reach every vertex of the tangle and since the tangle is attracted to Z, at least one α-vertex can play to Z. Therefore, the property holds for all attracted vertices of a tangle. 

**Lemma 4.** *For each new tangle* t*, all successors of* t *are in higher* α*-regions.*

*Proof.* For every bottom SCC B (computed in extract-tangles), E (v) <sup>⊆</sup> <sup>B</sup> for all <sup>α</sup>-vertices <sup>v</sup> <sup>∈</sup> <sup>B</sup>, otherwise player <sup>α</sup> could leave <sup>B</sup> and <sup>B</sup> would not be a bottom SCC. Recall that E (v) is restricted to edges in the subgame - = -\ r. Therefore <sup>E</sup>(v) <sup>⊆</sup> dom(r) <sup>∪</sup> <sup>B</sup> in the full game for all <sup>α</sup>-vertices <sup>v</sup> <sup>∈</sup> <sup>B</sup>. Recall that E<sup>T</sup> (t) for a tangle t refers to all successors for player α that leave the tangle. Hence, <sup>E</sup><sup>T</sup> (t) <sup>⊆</sup> dom(r) for every tangle <sup>t</sup> := <sup>B</sup>. Due to Lemma 1, no <sup>α</sup>-vertex in B can escape to a higher α-region. Thus E<sup>T</sup> (t) only contains vertices from higher <sup>α</sup>-regions when the new tangle is found by extract-tangles. 

**Lemma 5.** *Every nontrivial bottom SCC* B *of the reduced region restricted by witness strategy* σ *is a unique* p*-tangle.*

*Proof.* All <sup>α</sup>-vertices <sup>v</sup> in <sup>B</sup> have a strategy <sup>σ</sup>(v) <sup>∈</sup> <sup>B</sup>, since <sup>B</sup> is a bottom SCC when restricted by σ. B is strongly connected by definition. Per Lemma 2, player α wins all plays consistent with σ in the region and therefore also in B. Thus, B is a tangle. Per Lemma 3, player α can always reach a vertex of priority p, therefore any bottom SCC must include a vertex of priority p. Since p is the highest priority in the subgame, B is a p-tangle. Furthermore, the tangle must be unique. If the tangle was found before, then per Lemmas 1 and 4, it would have been attracted to a higher <sup>α</sup>-region. 

**Lemma 6.** *The lowest region in the decomposition always contains a tangle.*

*Proof.* The lowest region is always nonempty after reduction in extracttangles, as there are no lower regions. Furthermore, this region contains nontrivial bottom SCCs, since every vertex must have a successor in the region due to Lemma 1. 

**Lemma 7.** *A tangle* <sup>t</sup> *is a dominion if and only if* <sup>E</sup><sup>T</sup> (t) = <sup>∅</sup>

*Proof.* If the tangle is a dominion, then player α cannot leave it, therefore <sup>E</sup><sup>T</sup> (t) = <sup>∅</sup>. If <sup>E</sup><sup>T</sup> (t) = <sup>∅</sup>, then player <sup>α</sup> cannot leave the tangle and since all plays consistent with σ in the tangle are won by player α, the tangle is a dominion. 

# **Lemma 8.** <sup>E</sup><sup>T</sup> (t) = <sup>∅</sup> *for every tangle* <sup>t</sup> *found in the highest region of player* <sup>α</sup>*.*

*Proof.* Per Lemma 4, <sup>E</sup><sup>T</sup> (t) ⊆ {<sup>v</sup> <sup>∈</sup> dom(r) <sup>|</sup> <sup>r</sup>(v) <sup>≡</sup><sup>2</sup> <sup>p</sup>} when the tangle is found. There are no higher regions of player <sup>α</sup>, therefore <sup>E</sup><sup>T</sup> (t) = <sup>∅</sup>. 

**Lemma 9.** *The search algorithm terminates by finding a dominion.*

*Proof.* There is always a highest region of one of the players that is not empty. If a tangle is found in this region, then it is a dominion (Lemmas 7 and 8) and Algorithm 2 terminates (line 9). If no tangle is found in this region, then the opponent can escape to a lower region. Thus, if no dominion is found in a highest region, then there is a lower region that contains a tangle (Lemma 6) that must be unique (Lemma 5). As there are only finitely many unique tangles, eventually a dominion must be found. 

**Lemma 10.** *The solve algorithm solves parity games.*

*Proof.* Every invocation of search returns a dominion of the game (Lemma 9). The α-attractor of a dominion won by player α is also a dominion of player α. Thus all vertices in D are won by player α. The winning strategy is derived as the witness strategy of d with the strategy obtained by the attractor at line 6. At the end of solve every vertex of the game is either in <sup>W</sup> or <sup>W</sup> . 

# **4.6 Variations of Tangle Learning**

We propose three different variations of tangle learning that can be combined.

The first variation is *alternating tangle learning*, where players take turns to maximally learn tangles, i.e., in a turn of player Even, we only search for tangles in regions of player Even, until no more tangles are found. Then we search only for tangles in regions of player Odd, until no more tangles are found. When changing players, the last decomposition can be reused.

The second variation is *on-the-fly tangle learning*, where new tangles immediately refine the decomposition. When new tangles are found, the decomposition procedure is reset to the highest region that attracts one of the new tangles, such that all regions in the top-down decomposition remain α-maximal. This is the region with priority <sup>p</sup> := max{min{r(v) <sup>|</sup> <sup>v</sup> <sup>∈</sup> <sup>E</sup><sup>T</sup> (t)} | <sup>t</sup> <sup>∈</sup> <sup>A</sup>}.

A third variation skips the reduction step in extract-tangles and only extracts tangles from regions where none of the vertices of priority p can escape to lower regions. This still terminates finding a dominion, as Lemma 6 still applies, i.e., we always extract tangles from the lowest region. Similar variations are also conceivable, such as only learning tangles from the lowest region.

# **5 Complexity**

We establish a relation between the time complexity of tangle learning and the number of *learned* tangles.

**Lemma 11.** *Computing the top-down* α*-maximal decomposition of a parity game runs in time* <sup>O</sup>(dm <sup>+</sup> dn|T|) *given a parity game with* <sup>d</sup> *priorities,* <sup>n</sup> *vertices and* m *edges, and a set of tangles* T*.*

*Proof.* The attractor *Attr* - <sup>α</sup> runs in time O(n + m), if we record the number of remaining outgoing edges for each vertex [23]. The attractor *TAttr* -,T <sup>α</sup> runs in time <sup>O</sup>(<sup>n</sup> <sup>+</sup> <sup>m</sup> <sup>+</sup> <sup>|</sup>T<sup>|</sup> <sup>+</sup> <sup>n</sup>|T|), if implemented in a similar style. As <sup>m</sup> <sup>≥</sup> <sup>n</sup>, we simplify to <sup>O</sup>(<sup>m</sup> <sup>+</sup> <sup>n</sup>|T|). Since the decomposition computes at most <sup>d</sup> regions, the decomposition runs in time <sup>O</sup>(dm <sup>+</sup> dn|T|). 

**Lemma 12.** *Searching for tangles in the decomposition runs in time* O(dm)*.*

*Proof.* The extract-tangles procedure consists of a backward search, which runs in O(n + m), and an SCC search based on Tarjan's algorithm, which also runs in O(n+m). This procedure is performed at most d times (for each region). As <sup>m</sup> <sup>≥</sup> <sup>n</sup>, we simplify to <sup>O</sup>(dm).  **Lemma 13.** *Tangle learning runs in time* <sup>O</sup>(dnm|T<sup>|</sup> <sup>+</sup> dn<sup>2</sup>|T<sup>|</sup> <sup>2</sup>) *for a parity game with* <sup>d</sup> *priorities,* <sup>n</sup> *vertices,* <sup>m</sup> *edges, and* <sup>|</sup>T<sup>|</sup> learned *tangles.*

*Proof.* Given Lemmas 11 and 12, each iteration in search runs in time O(dm + dn|T|). The number of iterations is at most <sup>|</sup>T|, since we learn at least 1 tangle per iteration. Then search runs in time <sup>O</sup>(dm|T<sup>|</sup> <sup>+</sup> dn|T<sup>|</sup> <sup>2</sup>). Since each found dominion is then removed from the game, there are at most n calls to search. Thus tangle learning runs in time <sup>O</sup>(dnm|T<sup>|</sup> <sup>+</sup> dn<sup>2</sup>|T<sup>|</sup> <sup>2</sup>). 

**Fig. 2.** A parity game that requires several turns to find a dominion.

The complexity of tangle learning follows from the number of tangles that are learned before each dominion is found. Often not all tangles in a game need to be learned to solve the game, only certain tangles. Whether this number can be exponential in the worst case is an open question. These tangles often serve to remove *distractions* that prevent the other player from finding better tangles. This concept is illustrated by the example in Fig. 2, which requires multiple turns before a dominion is found. The game contains 4 tangles: {**c**}, {**g**} (a dominion), {**a**, **<sup>b</sup>**, **<sup>c</sup>**, **<sup>d</sup>**} and {**a**, **<sup>e</sup>**}. The vertices {**e**,**f**, **<sup>g</sup>**, **<sup>h</sup>**} do not form a tangle, since the opponent wins the loop of vertex **<sup>g</sup>**. The tangle {**a**, **<sup>b</sup>**, **<sup>c</sup>**, **<sup>d</sup>**} is a dominion in the remaining game after *Attr* -({**g**}) has been removed.

The tangle {**g**} is not found at first, as player Odd is distracted by **h**, i.e., prefers to play from **g** to **h**. Thus vertex **h** must first be attracted by the opponent. This occurs when player Even learns the tangle {**a**, **<sup>e</sup>**}, which is then attracted to **<sup>f</sup>**, which then attracts **<sup>h</sup>**. However, the tangle {**a**, **<sup>e</sup>**} is blocked, as player Even is distracted by **b**. Vertex **b** is attracted by player Odd when they learn the tangle {**c**}, which is attracted to **d**, which then attracts **b**. So player Odd must learn tangle {**c**} so player Even can learn tangle {**a**, **<sup>e</sup>**}, which player Even must learn so player Odd can learn tangle {**g**} and win the dominion {**e**,**f**, **<sup>g</sup>**, **<sup>h</sup>**}, after which player Odd also learns {**a**, **<sup>b</sup>**, **<sup>c</sup>**, **<sup>d</sup>**} and wins the entire game.

One can also understand the algorithm as the players learning that their opponent can now play from some vertex v via the learned tangle to a higher vertex w that is won by the opponent. In the example, we first learn that **b** actually leads to **d** via the learned tangle {**c**}. Now **b** is no longer safe for player Even. However, player Even can now play from both **d** and **h** via the learned 0-tangle {**a**, **<sup>e</sup>**} to **<sup>f</sup>**, so **<sup>d</sup>** and **<sup>h</sup>** are no longer interesting for player Odd and vertex **b** is again safe for player Even.

# **6 Implementation**

We implement four variations of tangle learning in the parity game solver Oink [7]. Oink is a modern implementation of parity game algorithms written in C++. Oink implements priority promotion [3], Zielonka's recursive algorithm [25], strategy improvement [11], small progress measures [17], and quasipolynomial time progress measures [12]. Oink also implements self-loop solving and winner-controlled winning cycle detection, as proposed in [23]. The implementation is publicly available via https://www.github.com/trolando/oink.

We implement the following variations of tangle learning: standard tangle learning (tl), alternating tangle learning (atl), on-the-fly tangle learning (otftl) and on-the-fly alternating tangle learning (otfatl). The implementation mainly differs from the presented algorithm in the following ways. We combine the solve and search algorithms in one loop. We remember the highest region that attracts a new tangle and reset the decomposition to that region instead of recomputing the full decomposition. In extract-tangles, we do not compute bottom SCCs for the highest region of a player, instead we return the entire reduced region as a single dominion (see also Lemma 8).

# **7 Empirical Evaluation**

The goal of the empirical evaluation is to study tangle learning and its variations on real-world examples and random games. Due to space limitations, we do not report in detail on crafted benchmark families (generated by PGSolver [13]), except that none of these games is difficult in runtime or number of tangles.

We use the parity game benchmarks from model checking and equivalence checking proposed by Keiren [19] that are publicly available online. These are 313 model checking and 216 equivalence checking games. We also consider random games, in part because the literature on parity games tends to favor studying the behavior of algorithms on random games. We include two classes of self-loop-free random games generated by PGSolver [13] with a fixed number of vertices:


We generate 20 games for each parameter N, in total 80 fully random games and 180 low out-degree games. All random games have N vertices and up to N distinct priorities. We include low out-degree games, since algorithms may behave differently on games where all vertices have few available moves, as also suggested in [3]. In fact, as we see in the evaluation, fully random games appear trivial to solve, whereas games with few moves per vertex are more challenging. Furthermore, the fully random games have fewer vertices but require more disk space (40 MB per compressed file for N = 7000) than large low out-degree games (11 MB per compressed file for N = 1000000).

We compare four variations of tangle learning to the implementations of Zielonka's recursive algorithm (optimized version of Oink) and of priority promotion (implemented in Oink by the authors of [3]). The motivation for this choice is that [7] shows that these are the fastest parity game solving algorithms.

In the following, we also use *cactus plots* to compare the algorithms. Cactus plots show that an algorithm solved X input games within Y seconds individually.


**Table 1.** Runtimes in sec. and number of timeouts (20 min) of the solvers Zielonka (zlk), priority promotion (pp), and tangle learning (tl, atl, otftl, otfatl).

All experimental scripts and log files are available online via https://www. github.com/trolando/tl-experiments. The experiments were performed on a cluster of Dell PowerEdge M610 servers with two Xeon E5520 processors and 24 GB internal memory each. The tools were compiled with gcc 5.4.0.

### **7.1 Overall Results**

Table 1 shows the cumulative runtimes of the six algorithms. For the runs that timed out, we simply used the timeout value of 1200 s, but this underestimates the actual runtime.

### **7.2 Model Checking and Equivalence Checking Games**

See Fig. 3 for the cactus plot of the six solvers on model checking and equivalence checking games. This graph suggests that for most games, tangle learning is only slightly slower than the other algorithms. The tangle learning algorithms require at most 2× as much time for 12 of the 529 games. There is no significant difference between the four variations of tangle learning.

**Fig. 3.** Cactus plots of the solvers Zielonka (zlk), priority promotion (pp) and tangle learning (tl, atl, otftl, otfatl). The plot shows how many MC&EC games (top) or large random games (bottom) are (individually) solved within the given time.

The 529 games have on average 1.86 million vertices and 5.85 million edges, and at most 40.6 million vertices and 167.5 million edges. All equivalence checking games have 2 priorities, so every tangle is a dominion. The model checking games have 2 to 4 priorities. Tangle learning learns non-dominion tangles for only 30 games, and more than 1 tangle only for the 22 games that check the infinitely often read write property. The most extreme case is 1,572,864 tangles for a game with 19,550,209 vertices. These are all 0-tangles that are then attracted to become part of 2-dominions.

That priority promotion and Zielonka's algorithm perform well is no surprise. See also Sect. 8.4. Solving these parity games requires few iterations for all algorithms, but tangle learning spends more time learning and attracting individual tangles, which the other algorithms do not do. Zielonka requires at most 27 iterations, priority promotion at most 28 queries and 9 promotions. Alternating tangle learning requires at most 2 turns. We conclude that these games are not complex and that their difficulty is related to their sheer size.

### **7.3 Random Games**

Table 1 shows no differences between the algorithms for the fully random games. Tangle learning learns no tangles except dominions for any of these games. This agrees with the intuition that the vast number of edges in these games lets attractor-based algorithms quickly attract large portions of the game.

See Fig. 3 for a cactus plot of the solvers on the larger random games. Only 167 games were solved within 20 min each by Zielonka's algorithm and only 174 games by priority promotion. See Table 2 for details of the slowest 10 random games for alternating tangle learning. There is a clear correlation between the runtime, the number of tangles and the number of turns. One game is particularly interesting, as it requires significantly more time than the other games.

The presence of one game that is much more difficult is a feature of using random games. It is likely that if we generated a new set of random games, we would obtain different results. This could be ameliorated by experimenting on thousands of random games and even then it is still a game of chance whether some of these random games are significantly more difficult than the others.

Time 543 148 121 118 110 83 81 73 68 52 Tangles 4,018 1,219 737 560 939 337 493 309 229 384 Turns 91 56 23 25 30 12 18 10 10 18 Size 1M 1M 700K 1M 700K 1M 1M 1M 1M 1M

**Table 2.** The 10 hardest random games for the atl algorithm, with time in seconds and size in number of vertices.

### **8 Tangles in Other Algorithms**

We argue that tangles play a fundamental role in various other parity game solving algorithms. We refer to [7] for descriptions of these algorithms.

### **8.1 Small Progress Measures**

The small progress measures algorithm [17] iteratively performs local updates to vertices until a fixed point is reached. Each vertex is equipped with some measure that records a statistic of the best game either player knows that they can play from that vertex so far. By updating measures based on the successors, they essentially play the game backwards. When they can no longer perform updates, the final measures indicate the winning player of each vertex.

The measures in small progress measures record how often each even priority is encountered along the most optimal play (so far) until a higher priority is encountered. As argued in [7,14], player Even tries to visit vertices with even priorities as often as possible, while prioritizing plays with more higher even priorities. This often resets progress for lower priorities. Player Odd has the opposite goal, i.e., player Odd prefers to play to a lower even priority to avoid a higher even priority, even if the lower priority is visited infinitely often. When the measures record a play from some vertex that visits more vertices with some even priority than exist in the game, this indicates that player Even can force player Odd into a cycle, unless they concede and play to a higher even priority. A mechanism called cap-and-carryover [7] ensures via slowly rising measures that the opponent is forced to play to a higher even priority.

We argue that when small progress measures finds cycles of some priority p, this is due to the presence of a p-tangle, namely precisely those vertices whose measures increase beyond the number of vertices with priority p, since these measures can only increase so far in the presence of cycles out of which the opponent cannot escape except by playing to vertices with a higher even priority.

One can now understand small progress measures as follows. The algorithm indirectly searches for tangles of player Even, and then searches for the best escape for player Odd by playing to the lowest higher even priority. If no such escape exists for a tangle, then the measures eventually rise to , indicating that player Even has a dominion. Whereas tangle learning is affected by *distractions*, small progress measures is driven by the dual notion of *aversions*, i.e., high even vertices that player Odd initially tries to avoid. The small progress measures algorithm tends to find tangles repeatedly, especially when they are nested.

### **8.2 Quasi-polynomial Time Progress Measures**

The quasi-polynomial time progress measures algorithm [12] is similar to small progress measures. This algorithm records the number of dominating even vertices along a play, i.e., such that every two such vertices are higher than all intermediate vertices. For example, in the path 1213142321563212, all vertices are dominated by each pair of underlined vertices of even priority. Higher even vertices are preferred, even if this (partially) resets progress on lower priorities.

Tangles play a similar role as with small progress measures. The presence of a tangle lets the value iteration procedure increase the measure up to the point where the other player "escapes" the tangle via a vertex outside of the tangle. This algorithm has a similar weakness to nested tangles, but it is less severe as progress on lower priorities is often retained. In fact, the lower bound game in [12], for which the quasi-polynomial time algorithm is slow, is precisely based on nested tangles and is easily solved by tangle learning.

### **8.3 Strategy Improvement**

As argued by Fearnley [10], tangles play a fundamental role in the behavior of strategy improvement. Fearnley writes that instead of viewing strategy improvement as a process that tries to increase valuations, one can view it as a process that tries to force "consistency with snares" [10, Sect. 6], i.e., as a process that searches for escapes from tangles.

### **8.4 Priority Promotion**

Priority promotion [3,5] computes a top-down α-maximal decomposition and identifies "closed α-regions", i.e., regions where the losing player cannot escape to lower regions. A closed α-region is essentially a collection of possibly unconnected tangles and vertices that are attracted to these tangles. Priority promotion then promotes the closed region to the lowest higher region that the losing player can play to, i.e., the lowest region that would attract one of the tangles in the region. Promoting is different from attracting, as tangles in a region can be promoted to a priority that they are not attracted to. Furthermore, priority promotion has no mechanism to remember tangles, so the same tangle can be discovered many times. This is somewhat ameliorated in extensions such as region recovery [2] and delayed promotion [1], which aim to decrease how often regions are recomputed.

Priority promotion has a good practical performance for games where computing and attracting individual tangles is not necessary, e.g., when tangles are only attracted once and all tangles in a closed region are attracted to the same higher region, as is the case with the benchmark games of [19].

### **8.5 Zielonka's Recursive Algorithm**

Zielonka's recursive algorithm [25] also computes a top-down α-maximal decomposition, but instead of attracting from lower regions to higher regions, the algorithm attracts from higher regions to tangles in the subgame. Essentially, the algorithm starts with the tangles in the lowest region and attracts from higher regions to these tangles. When vertices from a higher α-region are attracted to tangles of player α, progress for player α is reset. Zielonka's algorithm also has no mechanism to store tangles and games that are exponential for Zielonka's algorithm, such as in [4], are trivially solved by tangle learning.

### **9 Conclusions**

We introduced the notion of a tangle as a subgraph of the game where one player knows how to win all cycles. We showed how tangles and nested tangles play a fundamental role in various parity game algorithms. These algorithms are not explicitly aware of tangles and can thus repeatedly explore the same tangles. We proposed a novel algorithm called tangle learning, which identifies tangles in a parity game and then uses these tangles to attract sets of vertices at once. The key insight is that tangles can be used with the attractor to form bigger (nested) tangles and, eventually, dominions. We evaluated tangle learning in a comparison with priority promotion and Zielonka's recursive algorithm and showed that tangle learning is competitive for model checking and equivalence checking games, and outperforms other solvers for large random games.

We repeat Fearnley's assertion [10] that "a thorough and complete understanding of how snares arise in a game is a necessary condition for devising a polynomial time algorithm for these games". Fearnley also formulated the challenge to give a clear formulation of how the structure of tangles in a given game affects the difficulty of solving it. We propose that a difficult game for tangle learning must be one that causes alternating tangle learning to have many turns before a dominion is found.

**Acknowledgements.** We thank the anonymous referees for their helpful comments, Jaco van de Pol for the use of the computer cluster, and Armin Biere for generously supporting this research.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# SAT, SMT and Decision Procedures

# **Delta-Decision Procedures for Exists-Forall Problems over the Reals**

Soonho Kong1(B) , Armando Solar-Lezama<sup>2</sup>, and Sicun Gao<sup>3</sup>

> <sup>1</sup> Toyota Research Institute, Cambridge, USA soonho.kong@tri.global <sup>2</sup> Massachusetts Institute of Technology, Cambridge, USA asolar@csail.mit.edu <sup>3</sup> University of California, San Diego, USA sicung@ucsd.edu

**Abstract.** We propose <sup>δ</sup>-complete decision procedures for solving satisfiability of nonlinear SMT problems over real numbers that contain universal quantification and a wide range of nonlinear functions. The methods combine interval constraint propagation, counterexampleguided synthesis, and numerical optimization. In particular, we show how to handle the interleaving of numerical and symbolic computation to ensure delta-completeness in quantified reasoning. We demonstrate that the proposed algorithms can handle various challenging global optimization and control synthesis problems that are beyond the reach of existing solvers.

# **1 Introduction**

Much progress has been made in the framework of delta-decision procedures for solving nonlinear Satisfiability Modulo Theories (SMT) problems over real numbers [1,2]. Delta-decision procedures allow one-sided bounded numerical errors, which is a practically useful relaxation that significantly reduces the computational complexity of the problems. With such relaxation, SMT problems with hundreds of variables and highly nonlinear constraints (such as differential equations) have been solved in practical applications [3]. Existing work in this direction has focused on satisfiability of quantifier-free SMT problems. Going one level up, SMT problems with both free and universally quantified variables, which correspond to ∃∀-formulas over the reals, are much more expressive. For instance, such formulas can encode the search for robust control laws in highly nonlinear dynamical systems, a central problem in robotics. Non-convex, multi-objective, and disjunctive optimization problems can all be encoded as ∃∀-formulas, through the natural definition of "finding some x such that for all other x- , x is better than x with respect to certain constraints." Many other examples from various areas are listed in [4].

Counterexample-Guided Inductive Synthesis (CEGIS) [5] is a framework for program synthesis that can be applied to solve generic exists-forall problems. The idea is to break the process of solving ∃∀-formulas into a loop between *synthesis* and *verification*. The synthesis procedure finds solutions to the existentially quantified variables and gives the solutions to the verifier to see if they can be validated, or falsified by *counterexamples*. The counterexamples are then used as learned constraints for the synthesis procedure to find new solutions. This method has been shown effective for many challenging problems, frequently generating more optimized programs than the best manual implementations [5].

A direct application of CEGIS to decision problems over real numbers, however, suffers from several problems. CEGIS is complete in finite domains because it can explicitly enumerate solutions, which can not be done in continuous domains. Also, CEGIS ensures progress by avoiding duplication of solutions, while due to numerical sensitivity, precise control over real numbers is difficult. In this paper we propose methods that bypass such difficulties.

We propose an integration of the CEGIS method in the branch-and-prune framework as a generic algorithm for solving nonlinear ∃∀-formulas over real numbers and prove that the algorithm is δ-complete. We achieve this goal by using CEGIS-based methods for turning universally-quantified constraints into pruning operators, which is then used in the branch-and-prune framework for the search for solutions on the existentially-quantified variables. In doing so, we take special care to ensure correct handling of numerical errors in the computation, so that δ-completeness can be established for the whole procedure.

The paper is organized as follows. We first review the background, and then present the details of the main algorithm in Sect. 3. We then give a rigorous proof of the δ-completeness of the procedure in Sect. 4. We demonstrated the effectiveness of the procedures on various global optimization and Lyapunov function synthesis problems in Sect. 5.

*Related Work.* Quantified formulas in real arithmetic can be solved using symbolic quantifier elimination (using cylindrical decomposition [6]), which is known to have impractically high complexity (double exponential [7]), and can not handle problems with transcendental functions. State-of-the-art SMT solvers such as CVC4 [8] and Z3 [9] provide quantifier support [10–13] but they are limited to decidable fragments of first-order logic. Optimization Modulo Theories (OMT) is a new field that focuses on solving a restricted form of quantified reasoning [14–16], focusing on linear formulas. Generic approaches to solving exists-forall problems such as [17] are generally based on CEGIS framework, and not intended to achieve completeness. Solving quantified constraints has been explored in the constraint solving community [18]. In general, existing work has not proposed algorithms that intend to achieve any notion of completeness for quantified problems in nonlinear theories over the reals.

# **2 Preliminaries**

### **2.1 Delta-Decisions and CNF***∀***-Formulas**

We consider first-order formulas over real numbers that can contain arbitrary nonlinear functions that can be numerically approximated, such as polynomials, exponential, and trignometric functions. Theoretically, such functions are called *Type-2 computable* functions [19]. We write this language as LR*<sup>F</sup>* , formally defined as:

**Definition 1 (The** LR*<sup>F</sup>* **Language).** *Let* F *be the set of Type-2 computable functions. We define* LR*<sup>F</sup> to be the following first-order language:*

$$\begin{aligned} t &:= x \mid f(t), \text{ where } f \in \mathcal{F}, \text{ possibly } 0 \text{-ary (constant)};\\ \varphi &:= t(x) > 0 \mid t(x) \ge 0 \mid \varphi \land \varphi \mid \varphi \lor \varphi \mid \exists x\_i \varphi \mid \forall x\_i \varphi. \end{aligned}$$

*Remark 1.* Negations are not needed as part of the base syntax, as it can be defined through arithmetic: <sup>¬</sup>(t > 0) is simply <sup>−</sup>t <sup>≥</sup> 0. Similarly, an equality t = 0 is just t <sup>≥</sup> <sup>0</sup>∧ −t <sup>≥</sup> 0. In this way we can put the formulas in normal forms that are easy to manipulate.

We will focus on the ∃∀-formulas in L<sup>R</sup>*<sup>F</sup>* in this paper. Decision problems for such formulas are equivalent to satisfiability of SMT with universally quantified variables, whose free variables are implicitly existentially quantified.

It is clear that, when the quantifier-free part of an ∃∀ formula is in Conjunctive Normal Form (CNF), we can always push the universal quantifiers inside each conjunct, since universal quantification commute with conjunctions. Thus the decision problem for any ∃∀-formula is equivalent to the satisfiability of formulas in the following normal form:

**Definition 2 (CNF**<sup>∀</sup> **Formulas in** <sup>L</sup><sup>R</sup>*<sup>F</sup>* **).** *We say an* <sup>L</sup><sup>R</sup>*<sup>F</sup> -formula* <sup>ϕ</sup> *is in the CNF*∀*, if it is of the form*

$$\varphi(x) := \bigwedge\_{i=0}^{m} \left( \forall y (\bigvee\_{j=0}^{k\_i} c\_{ij}(x, y)) \right) \tag{1}$$

*where* <sup>c</sup>ij *are atomic constraints. Each universally quantified conjunct of the formula, i.e.,*

$$\forall y (\bigvee\_{j=0}^{k\_i} c\_{ij}(x, y))$$

*is called as a* ∀**-clause***. Note that* ∀*-clauses only contain disjunctions and no nested conjunctions. If all the* <sup>∀</sup>*-clauses are vacuous, we say* ϕ(*x*) *is a* ground SMT *formula.*

The algorithms described in this paper will assume that an input formula is in CNF<sup>∀</sup> form. We can now define the δ*-satisfiability* problems for CNF∀-formulas.

**Definition 3 (Delta-Weakening/Strengthening).** *Let* δ <sup>∈</sup> <sup>Q</sup><sup>+</sup> *be arbitrary. Consider an arbitrary CNF*∀*-formula of the form*

$$\varphi(x) := \bigwedge\_{i=0}^{m} \left( \forall y (\bigvee\_{j=0}^{k\_i} f\_{ij}(x, y) \circ 0) \right),$$

*where* ◦∈{>, ≥}*. We define the* δ*-weakening of* ϕ(*x*) *to be:*

$$\varphi^{-\delta}(x) := \bigwedge\_{i=0}^{m} \left( \forall y (\bigvee\_{j=0}^{k\_i} f\_{ij}(x, y) \ge -\delta) \right).$$

*Namely, we weaken the right-hand sides of all atomic formulas from* <sup>0</sup> *to* <sup>−</sup>δ*. Note how the difference between strict and nonstrict inequality becomes unimportant in the* δ*-weakening. We also define its dual, the* δ*-strengthening of* ϕ(*x*)*:*

$$\varphi^{+\delta}(x) := \bigwedge\_{i=0}^{m} \left( \forall y (\bigvee\_{j=0}^{k\_i} f\_{ij}(x, y) \ge +\delta) \right).$$

Since the formulas in the normal form no longer contain negations, the relaxation on the atomic formulas is implied by the original formula (and thus weaker), as was easily shown in [1].

**Proposition 1.** *For any* ϕ *and* δ <sup>∈</sup> <sup>Q</sup><sup>+</sup>*,* <sup>ϕ</sup><sup>−</sup><sup>δ</sup> *is logically weaker, in the sense that* ϕ <sup>→</sup> ϕ<sup>−</sup><sup>δ</sup> *is always true, but not vice versa.*

*Example 1.* Consider the formula

$$
\forall y \, f(x, y) = 0.
$$

It is equivalent to the CNF∀-formula

$$(\forall y (-f(x,y) \ge 0) \land \forall y (f(x,y) \ge 0))$$

whose δ-weakening is of the form

$$(\forall y (-f(x,y) \ge -\delta) \land \forall y (f(x,y) \ge -\delta)))$$

which is logically equivalent to

$$\forall y (||f(x,y)|| \le \delta).$$

We see that the weakening of f(x, y) = 0 by f(x, y) <sup>≤</sup> δ defines a natural relaxation.

**Definition 4 (Delta-Completeness).** *Let* <sup>δ</sup> <sup>∈</sup> <sup>Q</sup><sup>+</sup> *be arbitrary. We say an algorithm is* <sup>δ</sup>*-complete for* ∃∀*-formulas in* <sup>L</sup><sup>R</sup>*<sup>F</sup> , if for any input CNF*∀*-formula* ϕ*, it always terminates and returns one of the following answers correctly:*

*–* **unsat***:* ϕ *is unsatisfiable. –* δ**-sat***:* ϕ<sup>−</sup><sup>δ</sup> *is satisfiable.*

*When the two cases overlap, it can return either answer.*



#### **2.2 The Branch-and-Prune Framework**

A practical algorithm that has been shown to be δ-complete for ground SMT formulas is the *branch-and-prune* method developed for interval constraint propagation [20]. A description of the algorithm in the simple case of an equality constraint is in Algorithm 1.

The procedure combines *pruning* and *branching* operations. Let B be the set of all boxes (each variable assigned to an interval), and C a set of constraints in the language. FixedPoint(g, B) is a procedure computing a fixedpoint of a function g : B→B with an initial input B. A pruning operation Prune : B×C →B takes a box B ∈ B and a constraint as input, and returns an ideally smaller box B- ∈ B (Line 5) that is guaranteed to still keep all solutions for all constraints if there is any. When such pruning operations do not make progress, the Branch procedure picks a variable, divides its interval by halves, and creates two sub-problems <sup>B</sup><sup>1</sup> and <sup>B</sup><sup>2</sup> (Line 8). The procedure terminates if either all boxes have been pruned to be empty (Line 15), or if a small box whose maximum width is smaller than a given threshold δ has been found (Line 11). In [2], it has been proved that Algorithm <sup>1</sup> is δ-complete iff the pruning operators satisfy certain conditions for being *well-defined* (Definition 5).

### **3 Algorithm**

The core idea of our algorithm for solving CNF∀-formulas is as follows. We view the universally quantified constraints as a special type of pruning operators, which can be used to reduce possible values for the free variables based on their consistency with the universally-quantified variables. We then use these special ∀-pruning operators in an overall branch-and-prune framework to solve the full formula in a δ-complete way. A special technical difficulty for ensuring δ-completeness is to control numerical errors in the recursive search for counterexamples, which we solve using *double-sided error control*. We also improve quality of counterexamples using local-optimization algorithms in the ∀-pruning operations, which we call *locally-optimized counterexamples*.

In the following sections we describe these steps in detail. For notational simplicity we will omit vector symbols and assume all variable names can directly refer to vectors of variables.

# **3.1** *∀***-Clauses as Pruning Operators**

Consider an arbitrary CNF∀-formula<sup>1</sup>

$$\varphi(x) := \bigwedge\_{i=0}^{m} \left( \forall y (\bigvee\_{j=0}^{k\_i} f\_{ij}(x, y) \ge 0) \right).$$

It is a conjunction of ∀-clauses as defined in Definition 2. Consequently, we only need to define pruning operators for ∀-clauses so that they can be used in a standard branch-and-prune framework. The full algorithm for such pruning operation is described in Algorithm 2.

**Algorithm 2.** ∀-Clause Pruning

```
1: function Prune(Bx, By, ∀y
                         k
                          i=0 fi(x, y) ≥ 0, δ-

                                        , ε, δ)
2: repeat
3: Bprev
         x ← Bx
4: ψ ← 
             i fi(x, y) < 0
5: ψ+ε ← Strengthen(ψ, ε)
6: b ← Solve(y, ψ+ε, δ-

                      )  0 < δ-
                                                 <ε<δ should hold.
7: if b = ∅ then
8: return Bx  No counterexample found, stop pruning.
9: end if
10: for i ∈ {0, ..., k} do
11: Bi ← Bx ∩ Prune-

                        Bx, fi(x, b) ≥ 0

12: end for
13: Bx ← k
              i=0 Bi
14: until Bx = Bprev
                x
15: return Bx
16: end function
```
In Algorithm 2, the basic idea is to use special y values that witness the *negation* of the original constraint to prune the box assignment on x. The two core steps are as follows.

<sup>1</sup> Note that without loss of generality we only use nonstrict inequality here, since in the context of δ-decisions the distinction between strict and nonstrict inequalities as not important, as explained in Definition 3.


We can now put the pruning operators defined for all ∀-clauses in the overall branch-and-prune framework shown in Algorithm 1.

The pruning algorithms are inspired by the CEGIS loop, but are different in multiple ways. First, we never explicitly compute any candidate solution for the free variables. Instead, we only prune on their domain boxes. This ensures that the size of domain box decreases (together with branching operations), and the algorithm terminates. Secondly, we do not explicitly maintain a collection of constraints. Each time the pruning operation works on previous box – i.e., the learning is done on the model level instead of constraint level. On the other hand, being unable to maintain arbitrary Boolean combinations of constraints requires us to be more sensitive to the type of Boolean operations needed in the pruning results, which is different from the CEGIS approach that treats solvers as black boxes.

### **3.2 Double-Sided Error Control**

To ensure the correctness of Algorithm 2, it is necessary to avoid spurious counterexamples which do *not* satisfy the negation of the quantified part in a ∀-clause. We illustrate this condition by consider a *wrong* derivation of Algorithm 2 where we do not have the strengthening operation on Line 5 and try to find a counterexample by directly executing b <sup>←</sup> Solve(y,ψ <sup>=</sup> <sup>k</sup> <sup>i</sup>=0 <sup>f</sup><sup>i</sup>(x, y) <sup>&</sup>lt; <sup>0</sup>, δ). Note that the counterexample query ψ can be highly nonlinear in general and not included in a decidable fragment. As a result, it must employ a delta-decision procedure (i.e. Solve with δ- <sup>∈</sup> <sup>Q</sup><sup>+</sup>) to find a counterexample. A consequence of relying on a delta-decision procedure in the counterexample generation step is that we may obtain a spurious counterexample b such that for some x <sup>=</sup> a:

$$\bigwedge\_{i=0}^{k} f\_i(a,b) \le \delta \quad \text{instead of} \quad \bigwedge\_{i=0}^{k} f\_i(a,b) < 0.$$

Consequently the following pruning operations fail to reduce their input boxes because a spurious counterexample does not witness any inconsistencies between x and y. As a result, the fixedpoint loop in this <sup>∀</sup>-Clause pruning algorithm will be terminated immediately after the first iteration. This makes the outermost branch-and-prune framework (Algorithm 1), which employs this pruning algorithm, solely rely on branching operations. It can claim that the problem is δ-satisfiable while providing an arbitrary box B as a model which is small enough ( <sup>≤</sup> δ) but does not include a δ-solution.

B To avoid spurious counterexamples, we directly strengthen the counterexample query with ε <sup>∈</sup> <sup>Q</sup><sup>+</sup> to have

$$\psi^{+\varepsilon} := \bigwedge\_{i=0}^{k} f\_i(a, b) \le -\varepsilon.$$

Then we choose a weakening parameter δ- <sup>∈</sup> <sup>Q</sup> in solving the strengthened formula. By analyzing the two possible outcomes of this counterexample search, we show the constraints on δ- , ε, and δ which guarantee the correctness of Algorithm 2:

– δ- **-sat case:** We have a and b such that <sup>k</sup> <sup>i</sup>=0 <sup>f</sup><sup>i</sup>(a, b) ≤ −<sup>ε</sup> <sup>+</sup> <sup>δ</sup>- . For y <sup>=</sup> b to be a valid counterexample, we need <sup>−</sup>ε <sup>+</sup> δ- < 0. That is, we have

$$
\delta' < \varepsilon. \tag{2}
$$

In other words, the strengthening factor ε should be greater than the weakening parameter δin the counterexample search step.

– **unsat case:** By checking the absence of counterexamples, it proved that ∀y k <sup>i</sup>=0 <sup>f</sup><sup>i</sup>(x, y) ≥ −<sup>ε</sup> for all <sup>x</sup> <sup>∈</sup> <sup>B</sup><sup>x</sup>. Recall that we want to show that ∀y k <sup>i</sup>=0 <sup>f</sup><sup>i</sup>(x, y) ≥ −<sup>δ</sup> holds for some <sup>x</sup> <sup>=</sup> <sup>a</sup> when Algorithm <sup>1</sup> uses this pruning algorithm and returns δ-sat. To ensure this property, we need the following constraint on ε and δ:

$$
\varepsilon < \delta.\tag{3}
$$

### **3.3 Locally-Optimized Counterexamples**

The performance of the pruning algorithm for CNF∀-formulas depends on the quality of the counterexamples found during the search.

Figure 1a illustrates this point by visualizing a pruning process for an unconstrained minimization problem, <sup>∃</sup>x <sup>∈</sup> X<sup>0</sup>∀<sup>y</sup> <sup>∈</sup> <sup>X</sup><sup>0</sup>f(x) <sup>≤</sup> <sup>f</sup>(y). As it finds a series of counterexamples CE1, CE2, CE3, and CE4, the pruning algorithms uses those counterexamples to contract the interval assignment on <sup>X</sup> from <sup>X</sup><sup>0</sup> to <sup>X</sup><sup>1</sup>, <sup>X</sup><sup>2</sup>, <sup>X</sup><sup>3</sup>, and <sup>X</sup><sup>4</sup> in sequence. In the search for a counterexample (Line 6 of Algorithm 2), it solves the strengthened query, f(x) > f(y) + δ. Note that the query only requires a counterexample y <sup>=</sup> b to be δ-away from a candidate x while it is clear that the further a counterexample is away from candidates, the more effective the pruning algorithm is.

**Fig. 1.** Illustrations of the pruning algorithm for CNF∀-formula with and without using local optimization.

Based on this observation, we present a way to improve the performance of the pruning algorithm for CNF∀-formulas. After we obtain a counterexample b, we locally-optimize it with the counterexample query ψ so that it "further violates" the constraints. Figure 1b illustrates this idea. The algorithm first finds a counterexample CE<sup>1</sup> then refines it to CE- <sup>1</sup> by using a local-optimization algorithm (similarly, CE<sup>2</sup> → CE- <sup>2</sup>). Clearly, this refined counterexample gives a stronger pruning power than the original one. This refinement process can also help the performance of the algorithm by reducing the number of total iterations in the fixedpoint loop.

The suggested method is based on the assumption that local-optimization techniques are cheaper than finding a global counterexample using interval propagation techniques. In our experiments, we observed that this assumption holds practically. We will report the details in Sect. 5.

### **4** *δ***-Completeness**

We now prove that the proposed algorithm is δ-complete for arbitrary CNF<sup>∀</sup> formulas in <sup>L</sup><sup>R</sup>*<sup>F</sup>* . In the work of [2], <sup>δ</sup>-completeness has been proved for branchand-prune for ground SMT problems, under the assumption that the pruning operators are *well-defined*. Thus, the key for our proof here is to show that the ∀-pruning operators satisfy the conditions of well-definedness.

The notion of a well-defined pruning operator is defined in [2] as follows.

**Definition 5.** *Let* φ *be a constraint, and* <sup>B</sup> *be the set of all boxes in* <sup>R</sup><sup>n</sup>*. A pruning operator is a function* Prune : B×C → B*. We say such a pruning operator is well-defined, if for any* B ∈ B*, the following conditions are true:*


We will explain the intuition behind these requirements in the next proof, which aims to establish that Algorithm 2 defines a well-defined pruning operator.

**Lemma 1 (Well-Definedness of** ∀**-Pruning).** *Consider an arbitrary* ∀ *clause in the generic form*

$$c(x) := \forall y \Big( f\_1(x, y) \ge 0 \lor \dots \lor f\_k(x, y) \ge 0 \Big) \dots$$

*Suppose the pruning operators for* <sup>f</sup><sup>1</sup> <sup>≥</sup> <sup>0</sup>, ..., f<sup>k</sup> <sup>≥</sup> <sup>0</sup> *are well-defined, then the* <sup>∀</sup>*-pruning operation for* c(x) *as described in Algorithm <sup>2</sup> is well-defined.*

*Proof.* We prove that the pruning operator defined by Algorithm 2 satisfies the three conditions in Definition 5. Let <sup>B</sup>0, ..., B<sup>k</sup> be a sequence of boxes, where <sup>B</sup><sup>0</sup> is the input box <sup>B</sup><sup>x</sup> and <sup>B</sup><sup>k</sup> is the returned box <sup>B</sup>, which is possibly empty.

The first condition requires that the pruning operation for c(x) is reductive. That is, we want to show that <sup>B</sup><sup>x</sup> <sup>⊆</sup> <sup>B</sup>prev <sup>x</sup> holds in Algorithm 2. If it does not find a counterexample (Line 8), we have <sup>B</sup><sup>x</sup> <sup>=</sup> <sup>B</sup>prev <sup>x</sup> . So the condition holds trivially. Consider the case where it finds a counterexample b. The pruned box <sup>B</sup><sup>x</sup> is obtained through box-hull of all the <sup>B</sup><sup>i</sup> boxes (Line 13), which are results of pruning on Bprev <sup>x</sup> using ordinary constraints of the form <sup>f</sup><sup>i</sup>(x, b) <sup>≥</sup> 0 (Line 11), for a counterexample b. Following the assumption that the pruning operators are well-defined for each ordinary constraint <sup>f</sup><sup>i</sup> used in the algorithm, we know that <sup>B</sup><sup>i</sup> <sup>⊆</sup> <sup>B</sup>prev <sup>x</sup> holds as a loop invariant for the loop from Line 10 to Line 12. Thus, taking the box-hull of all the <sup>B</sup><sup>i</sup>, we obtain <sup>B</sup><sup>x</sup> that is still a subset of <sup>B</sup>prev <sup>x</sup> .

The second condition requires that the pruning operation does not eliminate real solutions. Again, by the assumption that the pruning operation on Line 11 does not lose any valid assignment on x that makes the <sup>∀</sup>-clause true. In fact, since y is universally quantified, any choice of assignment y <sup>=</sup> b will preserve solution on x as long as the ordinary pruning operator is well-defined. Thus, this condition is easily satisfied.

The third condition is the most nontrivial to establish. It ensures that when the pruning operator does not prune a box to the empty set, then the box should not be "way off", and in fact, should contain points that satisfy an appropriate relaxation of the constraint. We can say this is a notion of "faithfulness" of the pruning operator. For constraints defined by simple continuous functions, this can be typically guaranteed by the modulus of continuity of the function (Lipschitz constants as a special case). Now, in the case of ∀-clause pruning, we need to prove that the faithfulness of the ordinary pruning operators that are used translates to the faithfulness of the ∀-clause pruning results. First of all, this condition would not hold, if we do not have the strengthening operation when searching for counterexamples (Line 5). As is shown in Example 1, because of the weakening that δ-decisions introduce when searching for a counterexample, we may obtain a *spurious counterexample* that does not have pruning power. In other words, if we keep using a wrong counterexample that already satisfies the condition, then we are not able to rule out wrong assignments on x. Now, since we have introduced ε-strengthening at the counterexample search, we know that b obtained on Line 6 is a true counterexample. Thus, for some x <sup>=</sup> a, f<sup>i</sup>(a, b) <sup>&</sup>lt; <sup>0</sup> for every i. By assumption, the ordinary pruning operation using b on Line 11 guarantees faithfulness. That is, suppose the pruned result <sup>B</sup><sup>i</sup> is not empty and B<sup>i</sup> <sup>≤</sup> <sup>ε</sup>, then there exists constant <sup>c</sup><sup>i</sup> such that <sup>f</sup>i(x, b) ≥ −ci<sup>ε</sup> is true. Thus, we can take the <sup>c</sup> = min<sup>i</sup> <sup>c</sup><sup>i</sup> as the constant for the pruning operator defined by the full clause, and conclude that the disjunction <sup>k</sup> <sup>i</sup>=0 <sup>f</sup>i(x, y) <sup>&</sup>lt; <sup>−</sup>cε holds for B<sup>x</sup> <sup>≤</sup> ε.

Using the lemma, we follow the results in [2], and conclude that the branch-andprune method in Algorithm 1 is delta-complete:

**Theorem 1 (**δ**-Completeness).** *For any* <sup>δ</sup> <sup>∈</sup> <sup>Q</sup><sup>+</sup>*, using the proposed* <sup>∀</sup> *pruning operators defined in Algorithm 2 in the branch-and-prune framework described in Algorithm <sup>1</sup> is* <sup>δ</sup>*-complete for the class of CNF*∀*-formulas in* <sup>L</sup><sup>R</sup>*<sup>F</sup> , assuming that the pruning operators for all the base functions are well-defined.*

*Proof.* Following Theorem 4.2 (δ-Completeness of ICPε) in [2], a branch-andprune algorithm is δ-complete iff the pruning operators in the algorithm are all well-defined. Following Lemma 1, Algorithm 2 always defines well-defined pruning operators, assuming the pruning operators for the base functions are well-defined. Consequently, Algorithms 2 and 1 together define a delta-complete decision procedure for CNF∀-problems in L<sup>R</sup>*<sup>F</sup>* .

# **5 Evaluation**

*Implementation.* We implemented the algorithms on top of dReal [21], an opensource delta-SMT framework. We used IBEX-lib [22] for interval constraints pruning and CLP [23] for linear programming. For local optimization, we used NLopt [24]. In particular, we used SLSQP (Sequential Least-Squares Quadratic Programming) local-optimization algorithm [25] for differentiable constraints and COBYLA (Constrained Optimization BY Linear Approximations) localoptimization algorithm [26] for non-differentiable constraints. The prototype solver is able to handle ∃∀-formulas that involve most standard elementary functions, including power, exp, log, √·, trigonometric functions (sin, cos, tan), inverse trigonometric functions (arcsin, arccos, arctan), hyperbolic functions (sinh, cosh, tanh), etc.

*Experiment environment.* All experiments were ran on a 2017 Macbook Pro with 2.9 GHz Intel Core i7 and 16 GB RAM running MacOS 10.13.4. All code and benchmarks are available at https://github.com/dreal/CAV18.

*Parameters.* In the experiments, we chose the strengthening parameter <sup>=</sup> <sup>0</sup>.99δ and the weakening parameter in the counterexample search δ- = 0.98δ. In each call to NLopt, we used 1e–6 for both of absolute and relative tolerances on function value, 1e–3 s for a timeout, and 100 for the maximum number of evaluations. These values are used as stopping criteria in NLopt.

**Table 1.** Experimental results for nonlinear global optimization problems: The first 19 problems (Ackley 2D – Zettl) are unconstrained optimization problems and the last five problems (Rosenbrock Cubic – Simionescu) are constrained optimization problems. We ran our prototype solver over those instances with and without local-optimization option ("L-Opt." and "No L-Opt." columns) and compared the results. We chose δ = 0.0001 for all instances.


### **5.1 Nonlinear Global Optimization**

We encoded a range of highly nonlinear ∃∀-problems from constrained and unconstrained optimization literature [27,28]. Note that the standard optimization problem

$$\min f(x) \text{ s.t. } \varphi(x), \quad x \in \mathbb{R}^n,$$

can be encoded as the logic formula:

$$
\varphi(x) \land \forall y \Big(\varphi(y) \to f(x) \le f(y)\Big).
$$

**Fig. 2.** Nonlinear global optimization examples.

As plotted in Fig. 2, these optimization problems are non-trivial: they are highly non-convex problems that are designed to test global optimization or genetic programming algorithms. Many such functions have a large number of local minima. For example, Ripple 1 Function [27].

$$f(x\_1, x\_2) = \sum\_{i=1}^{2} -e^{-2\left(\log 2\right)\left(\frac{x\_1 - 0.1}{0.8}\right)^2} \left(\sin^6(5\pi x\_i) + 0.1\cos^2(500\pi x\_i)\right)^2$$

defined in <sup>x</sup><sup>i</sup> <sup>∈</sup> [0, 1] has 252004 local minima with the global minima f(0.1, <sup>0</sup>.1) = <sup>−</sup>2.2. As a result, local-optimization algorithms such as gradientdescent would not work for these problems for itself. By encoding them as ∃∀ problems, we can perform guaranteed global optimization on these problems.

Table 1 provides a summary of the experiment results. First, it shows that we can find minimum values which are close to the known global solutions. Second, it shows that enabling the local-optimization technique speeds up the solving process significantly for 20 instances out of 23 instances.

# **5.2 Synthesizing Lyapunov Function for Dynamical System**

We show that the proposed algorithm is able to synthesize Lyapunov functions for nonlinear dynamic systems described by a set of ODEs:

$$
\dot{\boldsymbol{x}}(t) = f\_i(\boldsymbol{x}(t)), \quad \forall \boldsymbol{x}(t) \in X\_i.
$$

Our approach is different from a recent related-work [29] where they used dReal only to verify a candidate function which was found by a simulationguided algorithm. In contrast, we want to do both of search and verify steps by solving a single ∃∀-formula. Note that to verify a Lyapunov candidate function v : X <sup>→</sup> <sup>R</sup><sup>+</sup>, <sup>v</sup> satisfies the following conditions:

$$
\forall x \in X \; \middle| \; \mathbf{0} \; v(x)(\mathbf{0}) = 0
$$

$$
\forall x \in X \; \nabla v(x(t))^T \cdot f\_i(x(t)) \le 0.
$$

We assume that a Lyapunov function is a polynomial of some fixed degrees over *<sup>x</sup>*, that is, v(*x*) = *<sup>z</sup>* <sup>T</sup>*Pz* where *<sup>z</sup>* is a vector of monomials over *<sup>x</sup>* and P is a symmetric matrix. Then, we can encode this synthesis problem into the ∃∀-formula:

$$\begin{aligned} \exists \mathbf{P} \left[ (v(\mathbf{z}) = (\mathbf{z}^T \mathbf{P} \mathbf{z})) \land \\ (\forall \mathbf{z} \in X \; \middle| \; \mathbf{0} \; v(\mathbf{z})(\mathbf{0}) = \mathbf{0}) \land \\ (\forall \mathbf{z} \in X \; \nabla v(\mathbf{z}(t))^T \cdot f\_i(\mathbf{z}(t)) \le 0) \right] \end{aligned}$$

In the following sections, we show that we can handle two examples in [29].

**Normalized Pendulum.** Given a standard pendulum system with normalized parameters

$$
\begin{bmatrix}
\dot{x}\_1\\ \dot{x}\_2
\end{bmatrix} = \begin{bmatrix}
x\_2\\ -\sin(x\_1) - x\_2
\end{bmatrix}
$$

and a quadratic template for a Lyapunov function <sup>v</sup>(*x*) = *<sup>x</sup>* <sup>T</sup>*Px* <sup>=</sup> <sup>c</sup><sup>1</sup>x<sup>1</sup>x<sup>2</sup> <sup>+</sup> c2x2 <sup>1</sup> <sup>+</sup>c<sup>3</sup>x<sup>2</sup> <sup>2</sup>, we can encode this synthesis problem into the following ∃∀-formula:

$$\exists c\_1 c\_2 c\_3 \forall x\_1 x\_2 \left[ \left( \left( 50 c\_3 x\_1 x\_2 + 50 x\_1^2 c\_1 + 50 x\_2^2 c\_2 > 0.5 \right) \land \right. \right. \\ \left. \begin{aligned} \left( \left( 50 c\_3 x\_1 x\_2 + 50 x\_1^2 c\_1 + 50 x\_2^2 c\_2 > 0.5 \right) \land \\ \left( \left( 50 c\_1 x\_1 x\_2 + 50 x\_2 c\_3 + \left( -x\_2 - \sin(x\_1) \left( 50 x\_1 c\_3 + 100 x\_2 c\_2 \right) \right) < -0.5 \right) \right) \lor \\ \neg \left( \left( 0.01 \le x\_1^2 + x\_2^2 \right) \land \left( x\_1^2 + x\_2^2 \le 1 \right) \right) \right] \end{aligned} \right]$$

Our prototype solver takes 44.184 s to synthesize the following function as a solution to the problem for the bound *x* <sup>∈</sup> [0.1, <sup>1</sup>.0] and <sup>c</sup><sup>i</sup> <sup>∈</sup> [0.1, 100] using δ = 0.05:

$$v = 40.6843x\_1x\_2 + 35.6870x\_1^2 + 84.3906x\_2^2.$$

**Damped Mathieu System.** Mathieu dynamics are time-varying and defined by the following ODEs:

$$
\begin{bmatrix}
\dot{x}\_1\\ \dot{x}\_2
\end{bmatrix} = \begin{bmatrix}
x\_2\\ -x\_2 - (2 + \sin(t))x\_1
\end{bmatrix}.
$$

Using a quadratic template for a Lyapunov function v(*x*) = *<sup>x</sup>* <sup>T</sup>*Px* <sup>=</sup> <sup>c</sup><sup>1</sup>x<sup>1</sup>x<sup>2</sup> <sup>+</sup> <sup>c</sup><sup>2</sup>x<sup>2</sup> <sup>1</sup> <sup>+</sup> <sup>c</sup><sup>3</sup>x<sup>2</sup> <sup>2</sup>, we can encode this synthesis problem into the following ∃∀-formula:

$$\begin{aligned} \exists c\_1 c\_2 c\_3 \forall x\_1 x\_2 t \; [(50x\_1 x\_2 c\_2 + 50x\_1^2 c\_1 + 50x\_2^2 c\_3 > 0) \land \\ (100c\_1 x\_1 x\_2 + 50x\_2 c\_2 + (-x\_2 - x\_1 (2 + \sin(t)))(50x\_1 c\_2 + 100x\_2 c\_3) < 0) \\ \lor \quad \neg ((0.01 \le x\_1^2 + x\_2^2) \land (0.1 \le t) \land (t \le 1) \land (x\_1^2 + x\_2^2 \le 1))] \end{aligned}$$

Our prototype solver takes 26.533 s to synthesize the following function as a solution to the problem for the bound *x* <sup>∈</sup> [0.1, <sup>1</sup>.0], <sup>t</sup> <sup>∈</sup> [0.1, <sup>1</sup>.0], and <sup>c</sup><sup>i</sup> <sup>∈</sup> [45, 98] using δ = 0.05:

$$V = 54.6950x\_1x\_2 + 90.2849x\_1^2 + 50.5376x\_2^2.$$

# **6 Conclusion**

We have described delta-decision procedures for solving exists-forall formulas in the first-order theory over the reals with computable real functions. These formulas can encode a wide range of hard practical problems such as general constrained optimization and nonlinear control synthesis. We use a branch-andprune framework, and design special pruning operators for universally-quantified constraints such that the procedures can be proved to be delta-complete, where suitable control of numerical errors is crucial. We demonstrated the effectiveness of the procedures on various global optimization and Lyapunov function synthesis problems.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Solving Quantified Bit-Vectors Using Invertibility Conditions**

Aina Niemetz1(B) , Mathias Preiner<sup>1</sup> , Andrew Reynolds<sup>2</sup> , Clark Barrett<sup>1</sup> , and Cesare Tinelli<sup>2</sup>

<sup>1</sup> Stanford University, Stanford, USA niemetz@cs.stanford.edu <sup>2</sup> The University of Iowa, Iowa City, USA

**Abstract.** We present a novel approach for solving quantified bit-vector formulas in Satisfiability Modulo Theories (SMT) based on computing symbolic inverses of bit-vector operators. We derive conditions that precisely characterize when bit-vector constraints are invertible for a representative set of bit-vector operators commonly supported by SMT solvers. We utilize syntax-guided synthesis techniques to aid in establishing these conditions and verify them independently by using several SMT solvers. We show that invertibility conditions can be embedded into quantifier instantiations using Hilbert choice expressions, and give experimental evidence that a counterexample-guided approach for quantifier instantiation utilizing these techniques leads to performance improvements with respect to state-of-the-art solvers for quantified bit-vector constraints.

# **1 Introduction**

Many applications in hardware and software verification rely on Satisfiability Modulo Theories (SMT) solvers for bit-precise reasoning. In recent years, the quantifier-free fragment of the theory of fixed-size bit-vectors has received a lot of interest, as witnessed by the number of applications that generate problems in that fragment and by the high, and increasing, number of solvers that participate in the corresponding division of the annual SMT competition. Modeling properties of programs and circuits, e.g., universal safety properties and program invariants, however, often requires the use of *quantified* bit-vector formulas. Despite a multitude of applications, reasoning efficiently about such formulas is still a challenge in the automated reasoning community.

The majority of solvers that support quantified bit-vector logics employ instantiation-based techniques [8,21,22,25], which aim to find conflicting ground instances of quantified formulas. For that, it is crucial to select good instantiations for the universal variables, or else the solver may be overwhelmed by the

This work was partially supported by DARPA under award No. FA8750-15-C-0113 and the National Science Foundation under award No. 1656926.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 236–255, 2018. https://doi.org/10.1007/978-3-319-96142-2\_16

number of ground instances generated. For example, consider a quantified formula <sup>ψ</sup> <sup>=</sup> <sup>∀</sup>x.(<sup>x</sup> <sup>+</sup> <sup>s</sup> ≈ <sup>t</sup>) where <sup>x</sup>, <sup>s</sup> and <sup>t</sup> denote bit-vectors of size 32. To prove that ψ is unsatisfiable we can instantiate x with all 2<sup>32</sup> possible bit-vector values. However, ideally, we would like to find a proof that requires much fewer instantiations. In this example, if we instantiate <sup>x</sup> with the symbolic term <sup>t</sup> <sup>−</sup> <sup>s</sup> (the inverse of <sup>x</sup> <sup>+</sup> <sup>s</sup> <sup>≈</sup> <sup>t</sup> when solved for <sup>x</sup>), we can immediately conclude that <sup>ψ</sup> is unsatisfiable since (<sup>t</sup> <sup>−</sup> <sup>s</sup>) + <sup>s</sup> ≈ <sup>t</sup> simplifies to false.

Operators in the theory of bit-vectors are not always invertible. However, we observe it is possible to identify quantifier-free conditions that precisely *characterize* when they are. We do that for a representative set of operators in the standard theory of bit-vectors supported by SMT solvers. For example, we have proven that the constraint <sup>x</sup>· <sup>s</sup> <sup>≈</sup> <sup>t</sup> is solvable for <sup>x</sup> if and only if (−<sup>s</sup> <sup>|</sup> <sup>s</sup>) & <sup>t</sup> <sup>≈</sup> <sup>t</sup> is satisfiable. Using this observation, we develop a novel approach for solving quantified bit-vector formulas that utilizes invertibility conditions to generate symbolic instantiations. We show that invertibility conditions can be embedded into quantifier instantiations using Hilbert choice functions in a sound manner. This approach has compelling advantages with respect to previous approaches, which we demonstrate in our experiments.

More specifically, this paper makes the following *contributions*.


*Related Work.* Quantified bit-vector logics are currently supported by the SMT solvers Boolector [16], CVC4 [2], Yices [7], and Z3 [6] and a Binary Decision Diagram (BDD)-based tool called Q3B [14]. Out of these, only CVC4 and Z3 provide support for combining quantified bit-vectors with other theories, e.g., the theories of arrays or real arithmetic. Arbitrarily nested quantifiers are handled by all but Yices, which only supports bit-vector formulas of the form <sup>∃</sup>*x*∀*y*. Q[*x*, *<sup>y</sup>*] [8]. For quantified bit-vectors, CVC4 employs counterexample-guided quantifier instantiation (CEGQI) [22], where concrete models of a set of ground instances and the negation of the input formula (the counterexamples) serve as instantiations for the universal variables. In Z3, model-based quantifier instantiation (MBQI) [10] is combined with a template-based model finding procedure [25]. In contrast to CVC4, Z3 not only relies on concrete counterexamples as candidates for quantifier instantiation but generalizes these counterexamples to generate symbolic instantiations by selecting ground terms with the same model value. Boolector employs a syntax-guided synthesis approach to synthesize interpretations for Skolem functions based on a set of ground instances of the formula, and uses a counterexample refinement loop similar to MBQI [21]. Other counterexampleguided approaches for quantified formulas in SMT solvers have been considered by Bjørner and Janota [4] and by Reynolds et al. [23], but they have mostly targeted quantified linear arithmetic and do not specifically address bit-vectors. Quantifier elimination for a fragment of bit-vectors that covers modular linear arithmetic has been recently addressed by John and Chakraborty [13], although we do not explore that direction in this paper.

# **2 Preliminaries**

We assume the usual notions and terminology of many-sorted first-order logic with equality (denoted by <sup>≈</sup>). Let <sup>S</sup> be a set of *sort symbols*, and for every sort <sup>σ</sup> <sup>∈</sup> <sup>S</sup> let <sup>X</sup><sup>σ</sup> be an infinite set of *variables of sort* <sup>σ</sup>. We assume that sets X<sup>σ</sup> are pairwise disjoint and define X as the union of sets Xσ. Let Σ be a *signature* consisting of a set <sup>Σ</sup><sup>s</sup> <sup>⊆</sup> <sup>S</sup> of sort symbols and a set <sup>Σ</sup><sup>f</sup> of interpreted (and sorted) function symbols <sup>f</sup>σ1···σ*n*<sup>σ</sup> with arity <sup>n</sup> <sup>≥</sup> 0 and <sup>σ</sup>1, ..., σn, σ <sup>∈</sup> <sup>Σ</sup><sup>s</sup>. We assume that a signature <sup>Σ</sup> includes a Boolean sort Bool and the Boolean constants (true) and <sup>⊥</sup> (false). Let <sup>I</sup> be a <sup>Σ</sup> *-interpretation* that maps: each <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>s</sup> to a non-empty set <sup>σ</sup><sup>I</sup> (the *domain* of <sup>I</sup>), with Bool<sup>I</sup> <sup>=</sup> { , ⊥}; each <sup>x</sup> <sup>∈</sup> <sup>X</sup><sup>σ</sup> to an element <sup>x</sup><sup>I</sup> <sup>∈</sup> <sup>σ</sup>I; and each <sup>f</sup>σ1···σ*n*<sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>f</sup> to a total function f<sup>I</sup> : σ<sup>I</sup> <sup>1</sup> <sup>×</sup> ... <sup>×</sup> <sup>σ</sup><sup>I</sup> <sup>n</sup> <sup>→</sup> <sup>σ</sup><sup>I</sup> if n > 0, and to an element in <sup>σ</sup><sup>I</sup> if <sup>n</sup> = 0. If <sup>x</sup> <sup>∈</sup> <sup>X</sup><sup>σ</sup> and <sup>v</sup> <sup>∈</sup> <sup>σ</sup>I, we denote by <sup>I</sup>[<sup>x</sup> <sup>→</sup> <sup>v</sup>] the interpretation that maps <sup>x</sup> to <sup>v</sup> and is otherwise identical to I. We use the usual inductive definition of a satisfiability relation <sup>|</sup>= between <sup>Σ</sup>-interpretations and <sup>Σ</sup>-formulas.

We assume the usual definition of well-sorted terms, literals, and formulas as Bool terms with variables in <sup>X</sup> and symbols in <sup>Σ</sup>, and refer to them as <sup>Σ</sup>terms, Σ-atoms, and so on. A *ground* term/formula is a Σ-term/formula without variables. We define *x* = (x1, ..., xn) as a tuple of variables and write Q*x*ϕ with <sup>Q</sup> ∈ {∀, ∃} for a *quantified* formula Qx<sup>1</sup> ··· Qxnϕ. We use Lit(ϕ) to denote the set of Σ-literals of Σ-formula ϕ. For a Σ-term or Σ-formula e, we denote the *free variables* of e (defined as usual) as F V(e) and use e[*x*] to denote that the variables in *x* occur free in e. For a tuple of Σ-terms *t* = (t1, ..., tn), we write e[*t*] for the term or formula obtained from e by simultaneously replacing each occurrence of <sup>x</sup><sup>i</sup> in <sup>e</sup> by <sup>t</sup>i. Given a <sup>Σ</sup>-formula <sup>ϕ</sup>[x] with <sup>x</sup> <sup>∈</sup> <sup>X</sup>σ, we use Hilbert's *choice* operator ε [12] to describe *properties* of x. We define a *choice function* εx. ϕ[x] as a term where <sup>x</sup> is bound by <sup>ε</sup>. In every interpretation <sup>I</sup>, εx. ϕ[x] denotes some value <sup>v</sup> <sup>∈</sup> <sup>σ</sup><sup>I</sup> such that <sup>I</sup>[<sup>x</sup> <sup>→</sup> <sup>v</sup>] satisfies <sup>ϕ</sup>[x] if such values exist, and denotes an arbitrary element of σ<sup>I</sup> otherwise. This means that the formula <sup>∃</sup>x. ϕ[x] <sup>⇔</sup> <sup>ϕ</sup>[εx. ϕ[x]] is satisfied by every interpretation.

A *theory* T is a pair (Σ,I), where Σ is a signature and I is a non-empty class of Σ-interpretations (the *models* of T) that is closed under variable reassignment, i.e., every <sup>Σ</sup>-interpretation that only differs from an I ∈ <sup>I</sup> in how it interprets


**Table 1.** Set of considered bit-vector operators with corresponding SMT-LIB 2 syntax.

variables is also in I. A Σ-formula ϕ is T*-satisfiable* (resp. T*-unsatisfiable*) if it is satisfied by some (resp. no) interpretation in I; it is T*-valid* if it is satisfied by all interpretations in <sup>I</sup>. A choice function εx. ϕ[x] is *(* <sup>T</sup>*-)valid* if <sup>∃</sup>x. ϕ[x] is (T-) valid. We refer to a term t as ε *-(* T*-)valid* if all occurrences of choice functions in t are (T-)valid. We will sometimes omit T when the theory is understood from context.

We will focus on the theory TBV = (ΣBV , IBV ) of fixed-size bit-vectors as defined by the SMT-LIB 2 standard [3]. The signature ΣBV includes a unique sort for each positive bit-vector width n, denoted here as σ[n]. Similarly, X[n] is the set of *bit-vector variables* of sort σ[n], and XBV is the union of all sets X[n]. We assume that ΣBV includes all *bit-vector constants* of sort σ[n] for each n, represented as bit-strings. However, to simplify the notation we will sometimes denote them by the corresponding natural number in {0,..., <sup>2</sup><sup>n</sup>−<sup>1</sup>}. All interpretations I ∈ <sup>I</sup>BV are identical except for the value they assign to variables. They interpret sort and function symbols as specified in SMT-LIB 2. All function symbols in Σ<sup>f</sup> BV are overloaded for every <sup>σ</sup>[n] <sup>∈</sup> <sup>Σ</sup><sup>s</sup> BV . We denote a ΣBV -term (or *bit-vector term*) t of width n as t[n] when we want to specify its bit-width explicitly. We use max<sup>s</sup>[n] or min<sup>s</sup>[n] for the *maximum* or *minimum signed value* of width n, e.g., max<sup>s</sup>[4] = 0111 and min<sup>s</sup>[4] = 1000. The width of a bit-vector sort or term is given by the function κ, e.g., κ(σ[n]) = n and κ(t[n]) = n.

Without loss of generality, we consider a restricted set of bit-vector function symbols (or *bit-vector operators*) Σ<sup>f</sup> BV as listed in Table 1. The selection of operators in this set is arbitrary but complete in the sense that it suffices to express all bit-vector operators defined in SMT-LIB 2.

### **3 Invertibility Conditions for Bit-Vector Constraints**

This section formally introduces the concept of an invertibility condition and shows that such conditions can be used to construct symbolic solutions for a class of quantifier-free bit-vector constraints that have a linear shape.

Consider a bit-vector literal <sup>x</sup> <sup>+</sup> <sup>s</sup> <sup>≈</sup> <sup>t</sup> and assume that we want to solve for x. If the literal is *linear* in x, that is, has only one occurrence of x, a general solution for <sup>x</sup> is given by the inverse of bit-vector addition over equality: <sup>x</sup> <sup>=</sup> <sup>t</sup>−s. Computing the inverse of a bit-vector operation, however, is not always possible. For example, for <sup>x</sup> · <sup>s</sup> <sup>≈</sup> <sup>t</sup>, an inverse always exists only if <sup>s</sup> always evaluates to an odd bit-vector. Otherwise, there are values for s and t where no such inverse exists, e.g., <sup>x</sup> · <sup>2</sup> <sup>≈</sup> 3. However, even if there is no unconditional inverse for the general case, we can identify the condition under which a bit-vector operation is invertible. For the bit-vector multiplication constraint <sup>x</sup> · <sup>s</sup> <sup>≈</sup> <sup>t</sup> with x /<sup>∈</sup> F V(s) <sup>∪</sup> F V(t), the *invertibility condition* for <sup>x</sup> can be expressed by the formula (−<sup>s</sup> <sup>|</sup> <sup>s</sup>) & <sup>t</sup> <sup>≈</sup> <sup>t</sup>.

**Definition 1** *(Invertibility Condition). Let* [x] *be a* ΣBV *-literal. A quantifierfree* <sup>Σ</sup>BV *-formula* <sup>φ</sup><sup>c</sup> *is an* invertibility condition *for* <sup>x</sup> *in* [x] *if* <sup>x</sup> ∈ F V(φc) *and* <sup>φ</sup><sup>c</sup> ⇔ ∃x. [x] *is* <sup>T</sup>BV *-valid.*

An invertibility condition for a literal [x] provides the *exact conditions* under which [x] is solvable for x. We call it an "invertibility" condition because we can use Hilbert choice functions to express *all* such conditional solutions with a *single* symbolic term, that is, a term whose possible values are exactly the solutions for x in [x]. Recall that a choice function εy. ϕ[y] represents a solution for a formula ϕ[x] if there exists one, and represents an arbitrary value otherwise. We may use a choice function to describe inverse solutions for a literal [x] with invertibility condition <sup>φ</sup><sup>c</sup> as εy.(φ<sup>c</sup> <sup>⇒</sup> [y]). For example, for the general case of bit-vector multiplication over equality the choice function is defined as εy.((−<sup>s</sup> <sup>|</sup> <sup>s</sup>) & <sup>t</sup> <sup>≈</sup> <sup>t</sup> <sup>⇒</sup> <sup>y</sup> · <sup>s</sup> <sup>≈</sup> <sup>t</sup>).

**Lemma 2.** *If* φ<sup>c</sup> *is an invertibility condition for an* ε*-valid* ΣBV *-literal* [x] *and* <sup>r</sup> *is the term* εy.(φ<sup>c</sup> <sup>⇒</sup> [y])*, then* <sup>r</sup> *is* <sup>ε</sup>*-valid and* [r] ⇔ ∃x. [x] *is* <sup>T</sup>BV *-valid.*<sup>1</sup>

Intuitively, the lemma states that when [x] is satisfiable (under condition <sup>φ</sup>c), any value returned by the choice function εy.(φ<sup>c</sup> <sup>⇒</sup> [y]) is a solution of [x] (and thus <sup>∃</sup>x. [x] holds). Conversely, if there exists a value <sup>v</sup> for <sup>x</sup> that makes [x] true, then there is a model of <sup>T</sup>BV that interprets εy.(φ<sup>c</sup> <sup>⇒</sup> [y]) as <sup>v</sup>.

Now, suppose that ΣBV -literal is again linear in x but that x occurs arbitrarily deep in . Consider, for example, a literal <sup>s</sup><sup>1</sup> ·(s2+x) <sup>≈</sup> <sup>t</sup> where <sup>x</sup> does not occur in s1, s<sup>2</sup> or t. We can solve this literal for x by recursively computing the (possibly conditional) inverses of all bit-vector operations that involve x. That is, first we solve <sup>s</sup><sup>1</sup> · <sup>x</sup> <sup>≈</sup> <sup>t</sup> for <sup>x</sup> , where x is a fresh variable abstracting s<sup>2</sup> + x, which yields the choice function <sup>x</sup> <sup>=</sup> εy.((−s<sup>1</sup> <sup>|</sup> <sup>s</sup>1) & <sup>t</sup> <sup>≈</sup> <sup>t</sup> <sup>⇒</sup> <sup>s</sup><sup>1</sup> · <sup>y</sup> <sup>≈</sup> <sup>t</sup>). Then, we solve <sup>s</sup><sup>2</sup> <sup>+</sup> <sup>x</sup> <sup>≈</sup> <sup>x</sup> for <sup>x</sup>, which yields the solution <sup>x</sup> <sup>=</sup> <sup>x</sup> <sup>−</sup> <sup>s</sup><sup>2</sup> <sup>=</sup> εy.((−s<sup>1</sup> <sup>|</sup> <sup>s</sup>1) & <sup>t</sup> <sup>≈</sup> <sup>t</sup> <sup>⇒</sup> <sup>s</sup><sup>1</sup> · <sup>y</sup> <sup>≈</sup> <sup>t</sup>) <sup>−</sup> <sup>s</sup>2.

Figure 1 describes in pseudo code the procedure to solve for x in an arbitrary literal [x] = e[x] t that is linear in x. We assume that e[x] is built over the set of bit-vector operators listed in Table 1. Function solve recursively constructs a symbolic solution by computing (conditional) inverses as follows. Let function getInverse(x, [x]) return a term <sup>t</sup> that is the inverse of x in [x], i.e., such that [x] <sup>⇔</sup> <sup>x</sup> <sup>≈</sup> <sup>t</sup> . Furthermore, let function getIC(x, [x]) return the invertibility condition <sup>φ</sup><sup>c</sup> for <sup>x</sup> in [x]. If <sup>e</sup>[x] has the form (e1,...,en) with n > 0, <sup>x</sup> must

<sup>1</sup> All proofs can be found in an extended version of this paper [19].

```
solve(x, e[x] -
               t):
  If e = x
       If -
           ∈ {≈} then return t
       else return εy. (getIC(x, x -
                                     t) ⇒ y -
                                                 t).
  else e = (e1,...,ei[x],...,en) with n > 0 and x ∈ F V(ej ) for all j = i.
       Let d[x-

                ] = (e1,...,ei−1, x-

                                       , ei+1,...,en) where x-
                                                                 is a fresh variable.
       If -
           ∈ {≈, ≈ } and  ∈ {∼ , −, +}
       then let t
                 -
                  = getInverse(x-

                                   , d[x-

                                         ] ≈ t) and return solve(x, ei -
                                                                         t-

                                                                           )
       else let φc = getIC(x-

                               , d[x-

                                    ] -
                                       t) and return solve(x, ei ≈ εy. (φc ⇒ d[y] -
                                                                                        t)).
```
**Fig. 1.** Function solve for constructing a symbolic solution for <sup>x</sup> given a linear literal e[x] t.

occur in exactly one of the subterms e1,...,e<sup>n</sup> given that e is linear in x. Let d be the term obtained from e by replacing e<sup>i</sup> (the subterm containing x) with a fresh variable x . We solve for subterm ei[x] (treating it as a variable x ) and compute an inverse getInverse(x , d[x ] <sup>≈</sup> <sup>t</sup>), if it exists. Note that for a disequality <sup>e</sup>[x] ≈ <sup>t</sup>, it suffices to compute the inverse over equality and propagate the disequality down. (For example, for <sup>e</sup>i[x] + <sup>s</sup> ≈ <sup>t</sup>, we compute the inverse t <sup>=</sup> getInverse(x , x <sup>+</sup> <sup>s</sup> <sup>≈</sup> <sup>t</sup>) = <sup>t</sup> <sup>−</sup> <sup>s</sup> and recurse on <sup>e</sup>i[x] ≈ <sup>t</sup> .) If no inverse for e[x] t exists, we first determine the invertibility condition φ<sup>c</sup> for d[x ] via getIC(x , d[x ] <sup>t</sup>), construct the choice function εy.(φ<sup>c</sup> <sup>⇒</sup> <sup>d</sup>[y] t), and set it equal to ei[x], before recursively solving for x. If e[x] = x and the given literal is an equality, we have reached the base case and return t as the solution for x. Note that in Fig. 1, for simplicity we omitted one case for which an inverse can be determined, namely <sup>x</sup> · <sup>c</sup> <sup>≈</sup> <sup>t</sup> where <sup>c</sup> is an odd constant.

**Theorem 3.** *Let* [x] *be an* ε*-valid* ΣBV *-literal linear in* x*, and let* r = solve(x, [x])*. Then* <sup>r</sup> *is* <sup>ε</sup>*-valid,* F V(r) <sup>⊆</sup> F V() \ {x} *and* [r] ⇔ ∃x. [x] *is* TBV *-valid.*

Tables 2 and 3 list the invertibility conditions for bit-vector operators {·, mod , <sup>÷</sup>, &, <sup>|</sup>, >>, >>a, <<, ◦} over relations {≈, ≈, <sup>&</sup>lt;u, <sup>&</sup>gt;u}. Due to space restrictions we omit the conditions for signed inequalities since they can be expressed in terms of unsigned inequality. We omit the invertibility conditions over {≤u, ≥u} since they can generally be constructed by combining the corresponding conditions for equality and inequality—although there might be more succinct equivalent conditions. Finally, we omit the invertibility conditions for operators {∼ , <sup>−</sup>, +} and literals <sup>x</sup> t over inequality since they are basic bounds checks, e.g., for x <<sup>s</sup> <sup>t</sup> we have <sup>t</sup> ≈ min. The invertibility condition for <sup>x</sup> ≈ <sup>t</sup> and for the extract operator is . 2

<sup>2</sup> All the omitted invertibility conditions can be found in the extended version of this paper [19].

The idea of computing the inverse of bit-vector operators has been used successfully in a recent local search approach for solving quantifier-free bit-vector constraints by Niemetz et al. [17]. There, target values are propagated via inverse value computation. In contrast, our approach does not determine single inverse values based on concrete assignments but aims at finding symbolic solutions through the generation of conditional inverses. In an extended version of that work [18], the same authors present rules for inverse value computation over equality but they provide no proof of correctness for them. We define invertibility conditions not only over equality but also disequality and (un)signed inequality, and verify their correctness up to a certain bit-width.

### **3.1 Synthesizing Invertibility Conditions**

We have defined invertibility conditions for all bit-vector operators in ΣBV where no general inverse exists (162 in total). A noteworthy aspect of this work is that we were able to leverage syntax-guided synthesis (SyGuS) technology [1] to help identify these conditions. The problem of finding invertibility conditions for a literal of the form <sup>x</sup> <sup>s</sup><sup>t</sup> (or, dually, <sup>s</sup> <sup>x</sup> t) linear in x can be recast as a SyGuS problem by asking whether there exists a binary Boolean function <sup>C</sup> such that the (second-order) formula <sup>∃</sup>C∀s∀t.((∃x. x <sup>s</sup><sup>t</sup>) <sup>⇔</sup> <sup>C</sup>(s, t)) is satisfiable. If a SyGuS solver is able to synthesize the function C, then C can be used as the invertibility condition for <sup>x</sup> <sup>s</sup> t. To simplify the SyGuS problem we chose a bit-width of 4 for x, s, and t and eliminated the quantification over x in the formula above by expanding it to

$$\exists C \forall s \forall t. \left(\bigvee\_{i=0}^{15} i \diamond s \bowtie t \right) \Leftrightarrow C(s, t)$$

Since the search space for SyGuS solvers heavily depends on the input grammar (which defines the solution space for C), we decided to use two grammars with the same set of Boolean connectives but different sets of bit-vector operators:

$$\begin{aligned} O\_r &= \{ \neg, \land, \approx, <\_u, <\_s, 0, \text{min}\_s, \text{max}\_s, s, t, \sim, -, \&, | \} \\ O\_g &= \{ \neg, \land, \lor, \approx, <\_u, <\_s, \ge\_s, 0, \text{min}\_s, \text{max}\_s, s, t, \sim, +, -, \&, | \} \end{aligned}$$

The selection of constants in the grammar turned out to be crucial for finding solutions, e.g., by adding min<sup>s</sup> and max<sup>s</sup> we were able to synthesize substantially more invertibility conditions for signed inequalities. For each of the two sets of operators, we generated 140 SyGuS problems<sup>3</sup>, one for each combination of bitvector operator ∈ {·, mod, <sup>÷</sup>, &, <sup>|</sup>, >>, >>a, <<} over relation ∈ {≈, ≈, <sup>&</sup>lt;u, <sup>≤</sup>u, <sup>&</sup>gt;u, <sup>≥</sup>u, <sup>&</sup>lt;s, <sup>≤</sup>s, <sup>&</sup>gt;s, <sup>≥</sup>s}, and used the SyGuS extension of the CVC4 solver [22] to solve these problems.

Using operators O<sup>r</sup> (Og) we were able to synthesize 98 (116) out of 140 invertibility conditions, with 118 unique solutions overall. When we found more

<sup>3</sup> Available at https://cvc4.cs.stanford.edu/papers/CAV2018-QBV/.


**Table 2.** Conditions for the invertibility of bit-vector operators over (dis)equality. Those for ·, & and | are given modulo commutativity of those operators.

than one solution for a condition (either with operators O<sup>r</sup> and Og, or manually) we chose the one that involved the smallest number of bit-vector operators. Thus, we ended up using 79 out of 118 synthesized conditions and 83 manually crafted conditions.

In some cases, the SyGuS approach was able to synthesize invertibility conditions that were smaller than those we had manually crafted. For example, we manually defined the invertibility condition for <sup>x</sup> · <sup>s</sup> <sup>≈</sup> <sup>t</sup> as (<sup>t</sup> <sup>≈</sup> 0) <sup>∨</sup> ((<sup>t</sup> & <sup>−</sup>t) <sup>≥</sup><sup>u</sup> (<sup>s</sup> & <sup>−</sup>s) <sup>∧</sup> (<sup>s</sup> ≈ 0)). With SyGuS we obtained ((−<sup>s</sup> <sup>|</sup> <sup>s</sup>) & <sup>t</sup>) <sup>≈</sup> <sup>t</sup>. For some other cases, however, the synthesized solution involved more bit-vector operators than needed. For example, for <sup>x</sup> mod <sup>s</sup> ≈ <sup>t</sup> we manually defined the invertibility condition (<sup>s</sup> ≈ 1) <sup>∨</sup> (<sup>t</sup> ≈ 0), whereas SyGuS produced the solution <sup>∼</sup>(−s) <sup>|</sup> <sup>t</sup> ≈ 0. For the majority of invertibility conditions, finding a solution did not require more than one hour of CPU time on an Intel Xeon E5-2637 with 3.5 GHz. Interestingly, the most time-consuming synthesis task (over 107 h of CPU time) was finding condition ((<sup>t</sup> <sup>+</sup> <sup>t</sup>) <sup>−</sup> <sup>s</sup>) & <sup>s</sup> <sup>≥</sup><sup>u</sup> <sup>t</sup> for <sup>s</sup> mod <sup>x</sup> <sup>≈</sup> <sup>t</sup>. A small number of synthesized solutions were only correct for a bit-width of 4, e.g., solution (∼s << s) <<s <<sup>s</sup> <sup>t</sup> for <sup>x</sup> <sup>÷</sup> s <<sup>s</sup> <sup>t</sup>. In total, we found 6 widthdependent synthesized solutions, all of them for bit-vector operators ÷ and mod. For those, we used the manually crafted invertibility conditions instead.


**Table 3.** Conditions for the invertibility of bit-vector operators over unsigned inequality. Those for ·, & and | are given modulo commutativity of those operators.

### **3.2 Verifying Invertibility Conditions**

We verified the correctness of all 162 invertibility conditions for bit-widths from 1 to 65 by checking for each bit-width the TBV -unsatisfiability of the formula <sup>¬</sup>(φ<sup>c</sup> ⇔ ∃x. [x]) where ranges over the literals in Tables <sup>2</sup> and <sup>3</sup> with <sup>s</sup> and <sup>t</sup> replaced by fresh constants, and φ<sup>c</sup> is the corresponding invertibility condition.

In total, we generated 12,980 verification problems and used all participating solvers of the quantified bit-vector division of SMT-competition 2017 to verify them. For each solver/benchmark pair we used a CPU time limit of one hour and a memory limit of 8 GB on the same machines as those mentioned in the previous section. We consider an invertibility condition to be verified for a certain bit-width if at least one of the solvers was able to report unsatisfiable for the corresponding formula within the given time limit. Out of the 12,980 instances, we were able to verify 12,277 (94.6%).

Overall, all verification tasks (including timeouts) required a total of 275 days of CPU time. The success rate of each individual solver was 91.4% for Boolector, 85.0% for CVC4, 50.8% for Q3B, and 92% for Z3. We observed that on 30.6% of the problems, Q3B exited with a Python exception without returning any result. For bit-vector operators {∼ , <sup>−</sup>, +, &, <sup>|</sup>, >>, >>a, <<, ◦}, over all relations, and for operators {·, <sup>÷</sup>, mod} over relations {≈, <sup>≤</sup>u, <sup>≤</sup>s}, we were able to verify all invertibility conditions for all bit-widths in the range 1–65. Interestingly, no solver was able to verify the invertibility conditions for x mod s <<sup>s</sup> t with a bit-width of 54 and s mod x <<sup>u</sup> t with bit-widths 35–37 within the allotted time. We attribute this to the underlying heuristics used by the SAT solvers in these systems. All other conditions for <<sup>s</sup> and <<sup>u</sup> were verified for all bitvector operators up to bit-width 65. The remaining conditions for operators {·, <sup>÷</sup>, mod} over relations {≈, <sup>&</sup>gt;u, <sup>≥</sup>u, <sup>&</sup>gt;s, <sup>≥</sup>s} were verified up to at least a bitwidth of 14. We discovered 3 conditions for <sup>s</sup> <sup>÷</sup> <sup>x</sup> t with ∈ {≈, >s, <sup>≥</sup>s} that were not correct for a bit-width of 1. For each of these cases, we added an additional invertibility condition that correctly handles that case.

We leave to future work the task of formally proving that our invertibility conditions are correct for all bit-widths. Since this will most likely require the development of an interactive proof, we could leverage some recent work by Ekici et al. [9] that includes a formalization in the Coq proof assistant of the SMT-LIB theory of bit-vectors.

### **4 Counterexample-Guided Instantiation for Bit-Vectors**

In this section, we leverage techniques from the previous section for constructing symbolic solutions to bit-vector constraints to define a novel instantiation-based technique for quantified bit-vector formulas. We first briefly present the overall theory-independent procedure we use for quantifier instantiation and then show how it can be specialized to quantified bit-vectors using invertibility conditions.

We use a counterexample-guided approach for quantifier instantiation, as given by procedure CEGQI<sup>S</sup> in Fig. 2. To simplify the exposition here, we focus on input problems expressed as a single formula in prenex normal form and with up to one quantifier alternation. We stress, though, that the approach applies in general to arbitrary sets of quantified formulas in some Σ-theory T with a decidable quantifier-free fragment. The procedure checks via instantiation the <sup>T</sup>-satisfiability of a quantified input formula <sup>ϕ</sup> of the form <sup>∃</sup>*y*∀*x*. ψ[*x*, *<sup>y</sup>*] where ψ is quantifier-free and *x* and *y* are possibly empty sequences of variables. It maintains an evolving set Γ, initially empty, of quantifier-free instances of the input formula. During each iteration of the procedure's loop, there are three possible cases: (1) if Γ is T-unsatisfiable, the input formula ϕ is also T-unsatisfiable and "unsat" is returned; (2) if <sup>Γ</sup> is <sup>T</sup>-satisfiable but not together with <sup>¬</sup>ψ[*y*, *<sup>x</sup>*], the negated body of ϕ, then Γ entails ϕ in T, hence ϕ is T-satisfiable and "sat" is returned. (3) If neither of previous cases holds, the procedure adds to Γ an instance of ψ obtained by replacing the variables *x* with some terms *t*, and continues. The procedure CEGQI is parametrized by a *selection function* <sup>S</sup> that generates the terms *t*.

**Definition 4** *(Selection Function). A* selection function *takes as input a tuple of variables <sup>x</sup>, a model* <sup>I</sup> *of* <sup>T</sup>*, a quantifier-free* <sup>Σ</sup>*-formula* <sup>ψ</sup>[*x*]*, and a set* <sup>Γ</sup> *of* <sup>Σ</sup>*-formulas such that <sup>x</sup>* ∈ F V(Γ) *and* I |<sup>=</sup> <sup>Γ</sup> ∪ {¬ψ}*. It returns a tuple of* <sup>ε</sup>*-valid terms <sup>t</sup> of the same type as <sup>x</sup> such that* F V(*t*) <sup>⊆</sup> F V(ψ) \ *<sup>x</sup>.*

CEGQI<sup>S</sup> (∃*y*∀*x*. ψ[*y*, *<sup>x</sup>*]) <sup>Γ</sup> := <sup>∅</sup> Repeat: 1. If Γ is T-unsatisfiable, then return "unsat". 2. Otherwise, if Γ-<sup>=</sup> <sup>Γ</sup> ∪ {¬ψ[*y*, *<sup>x</sup>*]} is <sup>T</sup>-unsatisfiable, then return "sat".

3. Otherwise, let <sup>I</sup> be a model of <sup>T</sup> and <sup>Γ</sup>and *<sup>t</sup>* <sup>=</sup> <sup>S</sup>(*x*, ψ, <sup>I</sup>, Γ). <sup>Γ</sup> := <sup>Γ</sup> ∪{ψ[*y*, *<sup>t</sup>*]}.

**Fig. 2.** A counterexample-guided quantifier instantiation procedure CEGQI*<sup>S</sup>* , parameterized by a selection function S, for determining the T-satisfiability of ∃*y*∀*x*. ψ[*y*, *x*] with ψ quantifier-free and F V(ψ) = *y* ∪ *x*.

**Definition 5.** *Let* ψ[*x*] *be a quantifier-free* Σ*-formula. A selection function is:*


Procedure CEGQI<sup>S</sup> is refutation-sound and model-sound for any selection function S, and terminating for selection functions that are finite and monotonic.

**Theorem 6 (Correctness of** CEGQI<sup>S</sup> **).** *Let* <sup>S</sup> *be a selection function and let* <sup>ϕ</sup> <sup>=</sup> <sup>∃</sup>*y*∀*x*. ψ[*y*, *<sup>x</sup>*] *be a legal input for* CEGQI<sup>S</sup> *. Then the following hold.*


Thanks to this theorem, to define a T-satisfiability procedure for quantified Σ-formulas, it suffices to define a selection function satisfying the criteria of Definition 4. We do that in the following section for TBV .

#### **4.1 Selection Functions for Bit-Vectors**

In Fig. 3, we define a (class of) selection functions <sup>S</sup>BV <sup>c</sup> for quantifier-free bitvector formulas, which is parameterized by a *configuration* c, a value of the enumeration type {**m**, **<sup>k</sup>**, **<sup>s</sup>**, **<sup>b</sup>**}. The selection function collects in the set <sup>M</sup> all the literals occurring in <sup>Γ</sup> that are satisfied by <sup>I</sup>. Then, it collects in the set N a *projected form* of each literal in M. This form is computed by the function project<sup>c</sup> parameterized by configuration <sup>c</sup>. That function transforms its input literal into a form suitable for function solve from Fig. 1. We discuss the intuition for projection operations in more detail below.

After constructing set N, the selection function computes a term t<sup>i</sup> for each variable x<sup>i</sup> in tuple *x*, which we call the *solved form* of xi. To do that, it first <sup>S</sup>BV <sup>c</sup> (*x*, ψ, <sup>I</sup>, Γ) where <sup>c</sup> ∈ {**m**, **<sup>k</sup>**, **<sup>s</sup>**, **<sup>b</sup>**} Let <sup>M</sup> <sup>=</sup> { |I|<sup>=</sup> , <sup>∈</sup> Lit(ψ)}, <sup>N</sup> <sup>=</sup> {projectc(I, ) <sup>|</sup> <sup>∈</sup> <sup>M</sup>}. For i = 1,...,n where *x* = (x1,...,xn): Let N<sup>i</sup> = - -[x1,...,x*i−*1]∈<sup>N</sup> linearize(xi, <sup>I</sup>, [t1,...,t<sup>i</sup>−<sup>1</sup>]). Let t<sup>i</sup> = solve(xi, choose(Ni)) if N<sup>i</sup> is non-empty x<sup>I</sup> <sup>i</sup> otherwise <sup>t</sup><sup>j</sup> := <sup>t</sup>j{x<sup>i</sup> <sup>→</sup> <sup>t</sup>i} for each j<i. Return (t1,...,tn). project**m**(I, s t) : return project**s**(I, s t) : return <sup>s</sup> <sup>≈</sup> <sup>t</sup> + (<sup>s</sup> <sup>−</sup> <sup>t</sup>) I project**k**(I, s t) : return s t project**b**(I, s t) : return ⎧ ⎪⎨ ⎪⎩ <sup>s</sup> <sup>≈</sup> <sup>t</sup> if <sup>s</sup><sup>I</sup> <sup>=</sup> <sup>t</sup> I <sup>s</sup> <sup>≈</sup> <sup>t</sup> + 1 if <sup>s</sup><sup>I</sup> > t<sup>I</sup> <sup>s</sup> <sup>≈</sup> <sup>t</sup> <sup>−</sup> <sup>1</sup> if <sup>s</sup><sup>I</sup> < t<sup>I</sup>

**Fig. 3.** Selection functions <sup>S</sup>BV <sup>c</sup> for quantifier-free bit-vector formulas. The procedure is parameterized by a configuration <sup>c</sup>, one of either **m** (model value), **k** (keep), **s** (slack), or **b** (boundary).

constructs a set of literals N<sup>i</sup> all linear in xi. It considers literals from N and replaces all previously solved variables <sup>x</sup>1,...,xi−<sup>1</sup> by their respective solved forms to obtain the literal <sup>=</sup> [t1,...,ti−<sup>1</sup>]. It then calls function linearize on literal which returns a *set* of literals, each obtained by replacing all but one occurrence of <sup>x</sup><sup>i</sup> in with the value of <sup>x</sup><sup>i</sup> in <sup>I</sup>. 4

*Example 7.* Consider an interpretation <sup>I</sup> where <sup>x</sup><sup>I</sup> = 1, and ΣBV -terms <sup>a</sup> and <sup>b</sup> with <sup>x</sup> ∈ F V(a) <sup>∪</sup> F V(b). We have that linearize(x, <sup>I</sup>, x · (<sup>x</sup> <sup>+</sup> <sup>a</sup>) <sup>≈</sup> <sup>b</sup>) returns the set {<sup>1</sup> · (<sup>x</sup> <sup>+</sup> <sup>a</sup>) <sup>≈</sup> b, x · (1 + <sup>a</sup>) <sup>≈</sup> <sup>b</sup>}; linearize(x, <sup>I</sup>, x <sup>≥</sup><sup>u</sup> <sup>a</sup>) returns the singleton set {<sup>x</sup> <sup>≥</sup><sup>u</sup> <sup>a</sup>}; linearize(x, <sup>I</sup>, a ≈ <sup>b</sup>) returns the empty set.

If the set N<sup>i</sup> is non-empty, the selection function heuristically chooses a literal from <sup>N</sup><sup>i</sup> (indicated in Fig. <sup>3</sup> with choose(Ni)). It then computes a solved form <sup>t</sup><sup>i</sup> for <sup>x</sup><sup>i</sup> by solving the chosen literal for <sup>x</sup><sup>i</sup> with the function solve described in the previous section. If N<sup>i</sup> is empty, we let t<sup>i</sup> is simply the value of x<sup>i</sup> in the given model <sup>I</sup>. After that, <sup>x</sup><sup>i</sup> is eliminated from all the previous terms <sup>t</sup>1,...,t<sup>i</sup>−<sup>1</sup> by replacing it with ti. After processing all n variables of *x*, the tuple (t1,...,tn) is returned.

The configurations of selection function <sup>S</sup>BV <sup>c</sup> determine how literals in M are modified by the project<sup>c</sup> function prior to computing solved forms, based on the current model I. With the *model value* configuration **m**, the selection function effective ignores the structure of all literals in M and (because the set N<sup>i</sup> is empty) ends up choosing the value x<sup>I</sup> <sup>i</sup> as the solved form variable

<sup>4</sup> This is a simple heuristic to generate literals that can be solved for xi. More elaborate heuristics could be used in practice.

xi, for each i. On the other end of the spectrum, the configuration **k** *keeps* all literals in M unchanged. The remaining two configurations have an effect on how disequalities and inequalities are handled by project<sup>c</sup>. With configuration **<sup>s</sup>** project<sup>c</sup> normalizes any kind of literal (equality, inequality or disequality) <sup>s</sup> t to an equality by adding the *slack* value (<sup>s</sup> <sup>−</sup> <sup>t</sup>)<sup>I</sup> to <sup>t</sup>. With configuration **<sup>b</sup>** it maps equalities to themselves and inequalities and disequalities to an equality corresponding to a *boundary point* of the relation between s and t based on the current model. Specifically, it adds one to <sup>t</sup> if <sup>s</sup> is greater than <sup>t</sup> in <sup>I</sup>, it subtracts one if <sup>s</sup> is smaller than <sup>t</sup>, and returns <sup>s</sup> <sup>≈</sup> <sup>t</sup> if their value is the same. These two configurations are inspired by quantifier elimination techniques for linear arithmetic [5,15]. In the following, we provide an end-to-end example of our technique for quantifier instantiation that makes use of selection function <sup>S</sup>BV <sup>c</sup> .

*Example 8.* Consider formula <sup>ϕ</sup> <sup>=</sup> <sup>∃</sup>*y*. <sup>∀</sup>x1.(x<sup>1</sup> · <sup>a</sup> <sup>≤</sup><sup>u</sup> <sup>b</sup>) where <sup>a</sup> and <sup>b</sup> are terms with no free occurrences of x1. To determine the satisfiability of ϕ, we invoke CEGQI<sup>S</sup>*BV <sup>c</sup>* on <sup>ϕ</sup> for some configuration <sup>c</sup>. Say that in the first iteration of the loop, we find that <sup>Γ</sup> <sup>=</sup> <sup>Γ</sup> ∪ {x<sup>1</sup> · a ><sup>u</sup> <sup>b</sup>} is satisfied by some model <sup>I</sup> of <sup>T</sup>BV such that x<sup>I</sup> <sup>1</sup> = 1, <sup>a</sup><sup>I</sup> = 1, and <sup>b</sup><sup>I</sup> = 0. We invoke <sup>S</sup>BV <sup>c</sup> ((x1), <sup>I</sup>, Γ ) and first compute <sup>M</sup> <sup>=</sup> {x<sup>1</sup> · a ><sup>u</sup> <sup>b</sup>}, the set of literals of <sup>Γ</sup> that are satisfied by <sup>I</sup>. The table below summarizes the values of the internal variables of <sup>S</sup>BV <sup>c</sup> for the various configurations:


In each case, <sup>S</sup>BV <sup>c</sup> returns the tuple (t1), and we add the instance <sup>t</sup><sup>1</sup> · <sup>a</sup> <sup>≤</sup><sup>u</sup> <sup>b</sup> to <sup>Γ</sup>. Consider configuration **<sup>k</sup>** where <sup>t</sup><sup>1</sup> is the choice expression εz.((a <<sup>u</sup> <sup>−</sup><sup>b</sup> <sup>|</sup> <sup>b</sup>) <sup>⇒</sup> <sup>z</sup> · a ><sup>u</sup> <sup>b</sup>). Since <sup>t</sup><sup>1</sup> is <sup>ε</sup>-valid, due to the semantics of <sup>ε</sup>, this instance is equivalent to:

$$((a <\_u - b \mid b) \Rightarrow k \cdot a >\_u b) \land k \cdot a \leq\_u b \tag{1}$$

for fresh variable <sup>k</sup>. This formula is <sup>T</sup>BV -satisfiable if and only if <sup>¬</sup>(a <<sup>u</sup> <sup>−</sup><sup>b</sup> <sup>|</sup> <sup>b</sup>) is <sup>T</sup>BV -satisfiable. In the second iteration of the loop in CEGQI<sup>S</sup>*BV <sup>c</sup>* , set <sup>Γ</sup> contains formula (1) above. We have two possible outcomes:


In fact, we argue later that quantified bit-vector formulas like ϕ above, which contain only one occurrence of a universal variable, require at most one instantiation before CEGQIS*BV* **<sup>k</sup>** terminates. The same guarantee does not hold with the other configurations. In particular, configuration **m** generates the instantiation where <sup>t</sup><sup>1</sup> is 1, which simplifies to <sup>a</sup> <sup>≤</sup><sup>u</sup> <sup>b</sup>. This may not be sufficient to show that Γ or Γ is unsatisfiable in the second iteration of the loop and the algorithm may resort to *enumerating* a repeating pattern of instantiations, such as <sup>x</sup><sup>1</sup> <sup>→</sup> <sup>1</sup>, <sup>2</sup>, <sup>3</sup>,... and so on. This obviously does not scale for problems with large bit-widths.

More generally, we note that CEGQI<sup>S</sup>*BV* **<sup>k</sup>** terminates with at most one instance for input formulas whose body has just one literal and a single occurrence of each universal variable. The same guarantee does not hold for instance for quantified formulas whose body has multiple disjuncts. For some intuition, consider extending the second conjunct of (1) with an additional disjunct, i.e. (<sup>k</sup> · <sup>a</sup> <sup>≤</sup><sup>u</sup> <sup>b</sup>∨[k]). A model can be found for this formula in which the invertibility condition (a <<sup>u</sup> <sup>−</sup><sup>b</sup> <sup>|</sup> <sup>b</sup>) is still satisfied, and hence we are not guaranteed to terminate on the second iteration of the loop. Similarly, if the literals of the input formula have multiple occurrences of x1, then multiple instances may be returned by the selection function since the literals returned by linearize in Fig. <sup>3</sup> depend on the model value of x1, and hence more than one possible instance may be considered in loop in Fig. 2.

The following theorem summarizes the properties of our selection functions. In the following, we say a quantified formula is *unit linear invertible* if it is of the form <sup>∀</sup>x.[x] where is linear in <sup>x</sup> and has an invertibility condition for <sup>x</sup>. We say a selection function is n*-finite* for a quantified formula ψ if the number of possible instantiations it returns is at most n for some positive integer n.

# **Theorem 9.** *Let* ψ[*x*] *be a quantifier-free formula in the signature of* TBV *.*


This theorem implies that counterexample-guided instantiation using configuration <sup>S</sup>BV **<sup>m</sup>** is a decision procedure for quantified bit-vectors. However, in practice the worst-case number of instances considered by this configuration for a variable x[n] is proportional to the number of its possible values (2<sup>n</sup>), which is practically infeasible for sufficiently large n. More interestingly, counterexample-guided instantiation using <sup>S</sup>BV **<sup>k</sup>** is a decision procedure for quantified formulas that are unit linear invertible, and moreover has the guarantee that at most one instantiation is returned by this selection function. Hence, formulas in this fragment can be effectively reduced to quantifier-free bit-vector constraints in at most two iterations of the loop of procedure CEGQI<sup>S</sup> in Fig. 2.

# **4.2 Implementation**

We implemented the new instantiation techniques described in this section as an extension of CVC4, which is a DPLL(T)-based SMT solver [20] that supports quantifier-free bit-vector constraints, (arbitrarily nested) quantified formulas, and support for choice expressions. For the latter, all choice expressions εx. ϕ[x] are eliminated from assertions by replacing them with a fresh variable k of the same type and adding ϕ[k] as a new assertion, which notice is sound since all choice expressions we consider are ε-valid. In the remainder of the paper, we will refer to our extension of the solver as **cegqi**. In the following, we discuss important implementation details of the extension.

*Handling Duplicate Instantiations.* The selection functions <sup>S</sup>BV **<sup>s</sup>** and <sup>S</sup>BV **<sup>b</sup>** are not guaranteed to be monotonic, neither is <sup>S</sup>BV **<sup>k</sup>** for quantified formulas that contain more than one occurrence of universal variables. Hence, when applying these strategies to arbitrary quantified formulas, we use a two-tiered strategy that invokes <sup>S</sup>BV **<sup>m</sup>** as a second resort if the instance for the terms returned by a selection function already exists in Γ.

*Linearizing Rewrites.* Our selection function in Fig. <sup>3</sup> uses the function linearize to compute literals that are linear in the variable x<sup>i</sup> to solve for. The way we presently implement linearize makes those literals dependent on the value of <sup>x</sup><sup>i</sup> in the current model I, with the risk of overfitting to that model. To address this limitation, we use a set of equivalence-preserving rewrite rules whose goal is to reduce the number of occurrences of x<sup>i</sup> to one when possible, by applying basic algebraic manipulations. As a trivial example, a literal like <sup>x</sup><sup>i</sup> <sup>+</sup> <sup>x</sup><sup>i</sup> <sup>≈</sup> <sup>a</sup> is rewritten first to 2 · <sup>x</sup><sup>i</sup> <sup>≈</sup> <sup>a</sup> which is linear in <sup>x</sup><sup>i</sup> if <sup>a</sup> does not contain <sup>x</sup>i. In that case, this literal, and so the original one, has an invertibility condition as discussed in Sect. 3.

*Variable Elimination.* We use procedure solve from Sect. <sup>3</sup> not only for selecting quantifier instantiations, but also for eliminating variables from quantified formulas. In particular, for a quantified formula of the form <sup>∀</sup>x*y*. <sup>⇒</sup> <sup>ϕ</sup>[x, *<sup>y</sup>*], if is linear in <sup>x</sup> and solve(x, ) returns a term <sup>s</sup> containing no <sup>ε</sup>-expressions, we can replace this formula by <sup>∀</sup>*y*. ϕ[s, *<sup>y</sup>*]. When is an equality, this is sometimes called destructive equality resolution (DER) and is an important implementation-level optimization in state-of-the-art bit-vector solvers [25]. As shown in Fig. 1, we use the getInverse function to increase the likelihood that solve returns a term that contains no ε-expressions.

*Handling Extract.* Consider formula <sup>∀</sup>x[32].(x[31 : 16] ≈ <sup>a</sup>[16] <sup>∨</sup> <sup>x</sup>[15 : 0] ≈ <sup>b</sup>[16]). Since all invertibility conditions for the extract operator are , rather than producing choice expressions we have found it more effective to eliminate extracts via rewriting. As a consequence, we independently solve constraints for *regions* of quantified variables when they appear underneath applications of extract operations. In this example, we let the solved form of <sup>x</sup> be <sup>y</sup>[16] ◦ <sup>z</sup>[16] where y and z are fresh variables, and subsequently solve for these variables in <sup>y</sup> <sup>≈</sup> <sup>a</sup> and <sup>z</sup> <sup>≈</sup> <sup>b</sup>. Hence, we may instantiate <sup>x</sup> with <sup>a</sup> ◦ <sup>b</sup>, a term that we would not have found by considering the two literals independently in the negated body of the formula above.

# **5 Evaluation**

We implemented our techniques in the solver **cegqi** and considered four configurations **cegqi**c, where **c** is one of {**m**, **k**, **s**, **b**}, corresponding to the four selection function configurations described in Sect. 4. Out of these four configurations, **cegqi<sup>m</sup>** is the only one that does not employ our new techniques but uses only model values for instantiation. It can thus be considered our base configuration. All configurations enable the optimizations described in Sect. 4.2 when applicable. We compared them against all entrants of the quantified bit-vector division of the 2017 SMT competition SMT-COMP: Boolector [16], CVC4 [2], Q3B [14] and Z3 [6]. With the exception of Q3B, all solvers are related to our approach since they are instantiation-based. However, none of these solvers utilizes invertibility conditions when constructing instantiations. We ran all experiments on the StarExec logic solving service [24] with a 300 s CPU and wall clock time limit and 100 GB memory limit.

We evaluated our approach on all 5,151 benchmarks from the quantified bitvector logic (BV) of SMT-LIB [3]. The results are summarized in Table 4. Configuration **cegqi<sup>b</sup>** solves the highest number of unsatisfiable benchmarks (4, 399), which is 30 more than the next best configuration **cegqi<sup>s</sup>** and 37 more than


**Table 4.** Results on satisfiable and unsatisfiable benchmarks with a 300 s timeout.

the next best external solver, Z3. Compared to the instantiation-based solvers Boolector, CVC4 and Z3, the performance of **cegqi<sup>b</sup>** is particularly strong on the h-uauto family, which are verification conditions from the Ultimate Automizer tool [11]. For satisfiable benchmarks, Boolector solves the most (581), which is 36 more than our best configuration **cegqib**.

Overall, our best configuration **cegqi<sup>b</sup>** solved 335 more benchmarks than our base configuration **cegqim**. A more detailed runtime comparison between the two is provided by the scatter plot in Fig. 4. Moreover, **cegqi<sup>b</sup>** solved 24 more benchmarks than the best external solver, Z3. In terms of uniquely solved instances, **cegqi<sup>b</sup>** was able to solve 139 benchmarks that were not solved by Z3, whereas Z3 solved 115 benchmarks that **cegqi<sup>b</sup>** did not. Overall, **cegqi<sup>b</sup>** was able to solve 21 of the 79 benchmarks (26.6%) not solved by any of the other solvers. For 18 of these 21 benchmarks, it terminated after considering no more than 4 instantiations. These cases indicate that using symbolic terms for instantiation solves problems for which other techniques, such as those that enumerate instantiations based on model values, do not scale.

Interestingly, configuration **cegqik**, despite having the strong guarantees given by Theorem 9, performed relatively poorly on this set (with 4, 571 solved instances overall). We attribute this to the fact that most of the quantified formulas in this set are not unit linear invertible. In total, we found that only 25.6% of the formulas considered during solving were unit linear invertible. However, only a handful of benchmarks were such that *all* quantified formulas in the problem were unit linear invertible. This might explain the superior performance of **cegqi<sup>s</sup>** and **cegqi<sup>b</sup>** which use invertibility conditions but in a less monolithic way.

For some intuition on this, consider the problem <sup>∀</sup>x.(x>a <sup>∨</sup> x<b) where a and b are such that a>b is TBV valid. Intuitively, to show that this formula is unsatisfiable requires the solver to find an x between b and a. This is apparent when considering the dual problem <sup>∃</sup>x.(<sup>x</sup> <sup>≤</sup> <sup>a</sup> <sup>∧</sup> <sup>x</sup> <sup>≥</sup> <sup>b</sup>). Configuration **cegqi<sup>b</sup>** is capable of finding such an x, for instance, by considering the instantiation <sup>x</sup> <sup>→</sup> <sup>a</sup> when solving for the boundary point of the first disjunct. Configuration **cegqik**, on the other hand, would instead consider the instantiation of x for two terms that witness ε-expressions: some k<sup>1</sup> that is

**Fig. 4.** Configuration **cegqi<sup>m</sup>** vs. **cegqi<sup>b</sup>**.

never smaller than a, and some k<sup>2</sup> that is never greater that b. Neither of these terms necessarily resides in between a and b since the solver may subsequently consider models where k<sup>1</sup> > b and k<sup>2</sup> < a. This points to a potential use for invertibility conditions that solve multiple literals simultaneously, something we are currently investigating.

### **6 Conclusion**

We have presented a new class of strategies for solving quantified bit-vector formulas based on invertibility conditions. We have derived invertibility conditions for the majority of operators in a standard theory of fixed-width bit-vectors. An implementation based on this approach solves over 25% of previously unsolved verification benchmarks from SMT LIB, and outperforms all other state-of-theart bit-vector solvers overall.

In future work, we plan to develop a framework in which the correctness of invertibility conditions can be formally established independently of bit-width. We are working on deriving invertibility conditions that are optimal for linear constraints, in the sense of admitting the simplest propositional encoding. We also are investigating conditions that cover additional bit-vector operators, some cases of non-linear literals, as well as those that cover multiple constraints. While this is a challenging task, we believe efficient syntax-guided synthesis solvers can continue to help push progress in this direction. Finally, we plan to investigate the use of invertibility conditions for performing quantifier elimination on bitvector constraints. This will require a procedure for deriving concrete witnesses from choice expressions.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Understanding and Extending Incremental Determinization for 2QBF**

Markus N. Rabe1(B) , Leander Tentrup<sup>2</sup>, Cameron Rasmussen<sup>1</sup>, and Sanjit A. Seshia<sup>1</sup>

> <sup>1</sup> University of California, Berkeley, Berkeley, USA rabe@berkeley.edu <sup>2</sup> Saarland University, Saarbr¨ucken, Germany

**Abstract.** Incremental determinization is a recently proposed algorithm for solving quantified Boolean formulas with one quantifier alternation. In this paper, we formalize incremental determinization as a set of inference rules to help understand the design space of similar algorithms. We then present additional inference rules that extend incremental determinization in two ways. The first extension integrates the popular CEGAR principle and the second extension allows us to analyze different cases in isolation. The experimental evaluation demonstrates that the extensions significantly improve the performance.

# **1 Introduction**

Solving quantified Boolean formulas (QBFs) is one of the core challenges in automated reasoning and is particularly important for applications in verification and synthesis. For example, program synthesis with syntax guidance [1,2] and the synthesis of reactive controllers from LTL specifications has been encoded in QBF [3,4]. Many of these problems require only formulas with one quantifier alternation (2QBF), which are the focus of this paper.

Algorithms for QBF and program synthesis largely rely on the counterexample-guided inductive synthesis principle (CEGIS) [5], originating in abstraction refinement (CEGAR) [6,7]. For example, for program synthesis, CEGIS-style algorithms alternate between generating candidate programs and checking them for counter-examples, which allows us to lift arbitrary verification approaches to synthesis algorithms. Unfortunately, this approach often degenerates into a plain guess-and-check loop when counter-examples cannot be generalized effectively. This carries over to the simpler setting of 2QBF. For example, even for a simple formula such as ∀x.∃y. x = y, where x and y are 32-bit numbers, most QBF algorithms simply enumerate all 2<sup>32</sup> pairs of assignments. In fact, even the modern QBF solvers diverge on this formula when preprocessing is deactivated.

Recently, Incremental Determinization (ID) has been suggested to overcome this problem [8]. ID represents a departure from the CEGIS approach in that it

c The Author(s) 2018 H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 256–274, 2018. https://doi.org/10.1007/978-3-319-96142-2\_17

is structured around identifying which variables have unique Skolem functions. (To prove the truth of a 2QBF ∀x.∃y. ϕ we have to find Skolem functions f mapping x to y such that ϕ[f /y] is valid.) After assigning Skolem functions to a few of the existential variables, the propagation procedure determines Skolem functions for other variables that are uniquely implied by that assignment. When the assignment of Skolem functions turns out to be incorrect, ID analyzes the conflict, derives a conflict clause, and backtracks some of the assignments. In other words, ID lifts CDCL to the space of Skolem functions.

ID can solve the simple example given above and shows good performance on various application benchmarks. Yet, the QBF competitions have shown that the relative performance of ID and CEGIS still varies a lot between benchmarks [9]. A third family of QBF solvers, based on the *expansion* of universal variables [10– 12], shows yet again different performance characteristics and outperforms both ID and CEGIS on some (few) benchmarks. This variety of performance characteristics of different approaches indicates that current QBF solvers could be significantly improved by integrating the different reasoning principles.

In this paper, we first formalize and generalize ID [8] (Sect. 3). This helps us to disentangle the working principles of the algorithm from implementation-level design choices. Thereby our analysis of ID enables a systematic and principled search for better algorithms for quantified reasoning. To demonstrate the value and flexibility of the formalization, we present two extensions of ID that integrate CEGIS-style inductive reasoning (Sect. 4) and expansion (Sect. 5). In the experimental evaluation we demonstrate that both extensions significantly improve the performance compared to plain ID (Sect. 6).

*Related Work.* This work is written in the tradition of works such as the Model Evolution Calculus [13], AbstractDPLL [14], MCSAT [15], and recent calculi for QBF [16], which present search algorithms as inference rules to enable the study and extension of these algorithms. ID and the inference rules presented in this paper can be seen as an instantiation of the more general frameworks, such as MCSAT [15] or Abstract Conflict Driven Learning [17].

Like ID, quantified conflict-driven clause learning (QCDCL) lifts CDCL to QBF [18,19]. The approaches differ in that QCDCL does not reason about functions, but only about values of variables. Fazekas et al. have formalized QCDCL as inference rules [16].

2QBF solvers based on CEGAR/CEGIS search for universal assignments and matching existential assignments using two SAT solvers [5,20,21]. There are several generalizations of this approach to QBF with more than one quantifier alternation [22–26].

# **2 Preliminaries**

Quantified Boolean formulas over a finite set of variables x ∈ X with domain <sup>B</sup> <sup>=</sup> {**0**, **<sup>1</sup>**} are generated by the following grammar:

$$\varphi \coloneqq \mathbf{0} \mid \mathbf{1} \mid x \mid \neg \varphi \mid (\varphi) \mid \varphi \lor \varphi \mid \varphi \land \varphi \mid \exists x. \varphi \mid \forall x. \varphi$$

We consider all other logical operations, including implication, XOR, and equality as syntactic sugar with the usual definitions. We abbreviate multiple quantifications Qx1.Qx2. . . . Qx*n*.ϕ using the same quantifier Q ∈ {∀, ∃} by the quantification over the set of variables X = {x1,...,x*n*}, denoted as QX.ϕ.

An *assignment <sup>x</sup>* to a set of variables <sup>X</sup> is a function *<sup>x</sup>* : <sup>X</sup> <sup>→</sup> <sup>B</sup> that maps each variable x ∈ X to either **1** or **0**. Given a propositional formula ϕ over variables <sup>X</sup> and an assignment *<sup>x</sup>* to <sup>X</sup> <sup>⊆</sup> <sup>X</sup>, we define <sup>ϕ</sup>(*x* ) to be the formula obtained by replacing the variables X by their truth value in *x* . By ϕ(*x* , *x*) we denote the replacement by multiple assignments for disjoint sets X , X ⊆ X.

A quantifier Q x. ϕ for Q ∈ {∃, ∀} *binds* the variable x in its subformula ϕ and we assume w.l.o.g. that every variable is bound at most once in any formula. A *closed* QBF is a formula in which all variables are bound. We define the dependency set of an existentially quantified variable y in a formula ϕ as the set *dep*(y) of universally quantified variables x such that ϕ's subformula ∃y.ψ is a subformula of ϕ's subformula ∀x.ψ . A *Skolem function* f*<sup>y</sup>* maps assignments to *dep*(y) to a truth value. We define the truth of a QBF ϕ as the existence of Skolem functions f*<sup>Y</sup>* = {f*<sup>y</sup>*<sup>1</sup> ,...,f*<sup>y</sup><sup>n</sup>* } for the existentially quantified variables <sup>Y</sup> <sup>=</sup> {y1,...,y*n*}, such that <sup>ϕ</sup>(*x*, f*<sup>Y</sup>* (*x*)) holds for every *<sup>x</sup>*, where <sup>f</sup>*<sup>Y</sup>* (*x*) is the assignment to Y that the Skolem functions f*<sup>Y</sup>* provide for *x*.

A formula is in *prenex normal form*, if the formula is closed and starts with a sequence of quantifiers followed by a propositional subformula. A formula ϕ is in the <sup>k</sup>QBF fragment for <sup>k</sup> <sup>∈</sup> <sup>N</sup><sup>+</sup> if it is closed, in prenex normal form, and has exactly k − 1 alternations between ∃ and ∀ quantifiers.

A *literal* l is either a variable x ∈ X, or its negation ¬x. Given a set of literals {l1,...,l*n*}, their disjunction (l<sup>1</sup> ∨ ... ∨ l*n*) is called a *clause* and their conjunction (l<sup>1</sup> ∧ ... ∧ l*n*) is called a *cube*. We use l to denote the literal that is the logical negation of l. We denote the variable of a literal by *var* (l) and lift the notion to clauses *var* (l<sup>1</sup> ∨···∨ l*n*) = {*var* (l1),..., *var* (l*n*)}.

A propositional formula is in conjunctive normal form (CNF), if it is a conjunction of clauses. A prenex QBF is in prenex conjunctive normal form (PCNF) if its propositional subformula is in CNF. Every QBF ϕ can be transformed into an equivalent PCNF with size O(|ϕ|) [27].

*Resolution* is a well-known proof rule that allows us to merge two clauses as follows. Given two clauses C<sup>1</sup> ∨ v and C<sup>2</sup> ∨ ¬v, we call C<sup>1</sup> ⊗*<sup>v</sup>* C<sup>2</sup> = C<sup>1</sup> ∨ C<sup>2</sup> their *resolvent* with pivot v. The resolution rule states that C<sup>1</sup> ∨ v and C<sup>2</sup> ∨ ¬v imply their resolvent. Resolution is refutationally complete for propositional Boolean formulas, i.e. for every propositional Boolean formula that is equivalent to false we can derive the empty clause.

For *quantified* Boolean formulas, however, we need additional proof rules. The two most prominent ways to make resolution complete for QBF are to add either *universal reduction* or *universal expansion*, leading to the proof systems Q-resolution [28] and ∀Exp-Res [10,29], respectively.

*Universal expansion* eliminates a single universal variable by creating two copies of the subformulas of its quantifier. Let Q1.∀x.Q2. ϕ be a QBF in PCNF, where Q<sup>1</sup> and Q<sup>2</sup> each are a sequence of quantifiers, and let Q<sup>2</sup> quantify over variables X. Universal expansion yields the *equivalent* formula Q1.Q2.Q 2. ϕ[**1**/x, X /X]∧ ϕ[**0**/x], where Q <sup>2</sup> is a copy of Q<sup>2</sup> but quantifying over a fresh set of variables X instead of X. The term ϕ[**1**/x, X /X] denotes the ϕ where x is replaced by **1** and the variables X are replaced by their counterparts in X .

*Universal reduction* allows us to drop universal variables from clauses when none of the existential variables in that clause may depend on them. Let C a clause of a QBF and let l be a literal of a universally quantified variable in C. Let us further assume that l does not occur in C. If all existential variables v in C we have *var* (l) ∈/ *dep*(v), universal reduction allows us to remove l from C. The resulting formula is equivalent to the original formula.

*Stack.* For convenience, we use a stack data structure to describe the algorithm. Formally, a stack is a finite sequence. Given a stack S, we use S(i) to denote the i-th element of the stack, starting with index 0, and we use S.S to denote concatenation. We use S[0, i] to denote the prefix up to element i of S. All stacks we consider are stacks of sets. In a slight abuse of notation, we also use stacks as the union of their elements when it is clear from the context. We also introduce an operation specific to stacks of sets S: We define *add*(S, i, x) to be the stack that results from extending the set on level i by element x.

### **2.1 Unique Skolem Functions**

Incremental determinization builds on the notion of unique Skolem functions. Let ∀X.∃Y. ϕ be a 2QBF in PCNF and let χ be a formula over X characterizing the *domain* of the Skolem functions we are currently interested in. We say that a variable v ∈ Y has a *unique Skolem function* for domain χ, if for each assignment *x* with χ(*x*) there is a *unique* assignment *v* to v such that ϕ(*x*, *v*) is satisfiable. In particular, a unique Skolem function is a Skolem function:

**Lemma 1.** *If all existential variables have a unique Skolem function for the full domain* χ = **1***, the formula is true.*

The semantic characterization of unique Skolem functions above does not help us with the computation of Skolem functions directly. We now introduce a local approximation of unique Skolem functions and show how it can be used as a propagation procedure.

We consider a set of variables D ⊆ X ∪ Y with D ⊇ X and focus on the subset ϕ|*<sup>D</sup>* of clauses that only contain variables in D. We further assume that the existential variables in D already have unique Skolem functions for χ in the formula ϕ|*D*. We now define how to extend D by an existential variable v /∈ D. To define a Skolem function for v we only consider the clauses with *unique consequence* v, denoted U*v*, that contain a literal of v and otherwise only literals of variables in D. (Note that ϕ|*<sup>D</sup>* ∪ U*<sup>v</sup>* = ϕ|*<sup>D</sup>*∪{*v*}). We define that variable v has a *unique Skolem function relative to* D for χ, if for all assignments to D satisfying χ and ϕ there is a unique assignment to v satisfying U*v*.

In order to determine unique Skolem functions relative to a set D in practice, we split the definition into the two statements deterministic and unconflicted. Each statement can be checked by a SAT solver and together they imply that variable v has a unique Skolem function relative to D.

Given a clause C with unique consequence v, let us call ¬(C \ {v,¬v}) the *antecedent* of C. Further, let A*<sup>l</sup>* = - *<sup>C</sup>*∈U*v,l*∈*<sup>C</sup>* <sup>¬</sup>(<sup>C</sup> \ {v,¬v}) be the disjunction of antecedents for the unique consequences containing the literal l of v. It is clear that whenever A*<sup>v</sup>* is satisfied, v needs to be true, and whenever A¬*<sup>v</sup>* is satisfied, v need to be false. We define:

$$\begin{array}{rcl} \mathsf{deterministic}(v, \varphi, \chi, D) :=& \forall D. \,\,\varphi|\_D \wedge \chi \Rightarrow & \mathcal{A}\_v \vee \mathcal{A}\_{\neg v} \\ \mathsf{unconflicted}(v, \varphi, \chi, D) :=& \forall D. \,\,\varphi|\_D \wedge \chi \Rightarrow \,\,\neg(\,\mathcal{A}\_v \wedge \mathcal{A}\_{\neg v} \; ) \end{array}$$

deterministic states that <sup>v</sup> needs to be assigned either true or false for every assignment to D in the domain χ that is consistent with the existing Skolem function definitions <sup>ϕ</sup>|*D*. Accordingly, unconflicted states that <sup>v</sup> does not have to be true and false at the same time (which would indicate a conflict) for any such assignment. Unique Skolem functions relative to a set D approximate unique Skolem functions as follows:

**Lemma 2.** *Let the existential variables in* D *have unique Skolem functions for domain* χ *and let* v ∈ Y *have a unique Skolem function relative to* D *for domain* χ*. Then* v *has a unique Skolem function for domain* χ*.*

# **3 Inference Rules for Incremental Determinization**

In this section, we develop a nondeterministic algorithm that formalizes and generalizes ID. We describe the algorithm in terms of inference rules that specify how the state of the algorithm can be developed. The state of the algorithm consists of the following elements:


We assume that we are given a 2QBF in PCNF ∀X.∃Y. ϕ and that all clauses in ϕ contain an existential variable. (If ϕ contains a non-tautological clause

$$\begin{array}{c} \text{(Ready, } C, D, \chi, \alpha) \qquad v \notin D \\ \text{PROPAGATE} \begin{array}{c} \text{deterministic} (v, C, \chi \wedge \alpha, D) \\ (\text{Ready, } C, \operatorname{add}(D, |D| - 1, v), \chi, \alpha) \end{array} \\\\ \text{DECIDE} \begin{array}{c} \text{(Ready, } C, D, \chi, \alpha) \\ (\text{Ready, } C, \delta, D, \emptyset, \chi, \alpha) \end{array} \qquad v \notin D \qquad \text{all } c \in \delta \text{ have unique consequence } v \\\\ \text{SAT} \begin{array}{c} (\textsf{Ready, } C, D, \chi, \mathbf{1}) \\ (\textsf{SAT}, C, D, \chi, \mathbf{1}) \end{array} \qquad D = X \cup Y \\\\ \text{SAT} \begin{array}{c} (\textsf{SAT}, C, D, \chi, \mathbf{1}) \end{array} \qquad D = X \cup Y \end{array}$$

**Fig. 1.** Inference rules needed to prove true QBF

without existential variables, the formula is trivially false by universal reduction.) We define (Ready, ϕ, X, **<sup>1</sup>**, **<sup>1</sup>**) to be the initial state of the algorithm. That is, the clause stack C initially has height 1 and contains the clauses of the formula ϕ. We initialize D as the stack of height 1 containing the universals.

Before we dive into the inference rules, we want to point out that some of the rules in this calculus are not computable in polynomial time. The judgements deterministic and unconflicted require us to solve a SAT problem and are, in general, NP-complete. This is still easier than the 2QBF problem itself (unless NP includes Π*<sup>P</sup>* <sup>2</sup> ) and in practice they can be discharged quickly by SAT solvers.

### **3.1 True QBF**

We continue with describing the basic version of ID, consisting of the rules in Figs. 1 and 2, and first focus on the rules in Fig. 1, which suffice to prove true 2QBFs. Propagate allows us to add a variable to D, if it has a unique Skolem function relative to D. (The notation *add*(D, |D| − 1, v) means that we add v to the last level of the stack.) The judgements deterministic and unconflicted involve the current set of clauses C (i.e. the union of all sets of clauses in the sequence C). These checks are restricted to the domain χ ∧ α. Both χ and α are true throughout this section; we discuss their use in Sects. 4 and 5.

**Invariant 1.** All existential variables in D have a unique Skolem function for the domain χ ∧ α in the formula ∀X.∃Y. C|*D*, where C|*<sup>D</sup>* are the clauses in C that contain only variables in D.

If Propagate identifies all variables to have unique Skolem functions relative to the growing set D, we know that they also have unique Skolem functions (Lemma 2). We can then apply Sat to reach the SAT state, representing that the formula has been proven true (Lemma 1).

**Lemma 3.** *ID cannot reach the SAT state for false QBF.*

*Proof.* Let us assume we reached the SAT state for a false 2QBF and prove the statement by way of contradiction. The SAT state can only be reached by the rule Sat and requires <sup>D</sup> <sup>=</sup> <sup>X</sup> <sup>∪</sup> <sup>Y</sup> . By Invariant <sup>1</sup> all variables have a Skolem function in ∀X.∃Y. C. Since C includes ϕ, this Skolem function does not violate any clause in ϕ, which means it is indeed a proof.

When Propagate is unable to determine the existence of a unique Skolem function (i.e. for variables where the judgement deterministic does not hold) we can use the rule Decide to introduce additional clauses such that deterministic holds and propagation can continue. Note that additional clauses make it easier to satisfy deterministic and adding the clause <sup>v</sup> (i.e. a unit clause) even *ensures* that deterministic holds for <sup>v</sup>.

Assuming we consider a true 2QBF, we can pick a Skolem function f*<sup>y</sup>* for each existential variable y and encode it using Decide. We can simply consider the truth table of f*<sup>y</sup>* in terms of the universal variables and define δ to be the set of clauses {¬*<sup>x</sup>* <sup>∨</sup><sup>v</sup> <sup>|</sup> <sup>f</sup>*y*(*x*)}∪ {¬*<sup>x</sup>* ∨¬<sup>v</sup> | ¬f*y*(*x*)}. (Here we interpret the assignment *x* as a conjunction of literals.) These clauses have unique consequence v and they guarantee that <sup>v</sup> is deterministic. Further, they guarantee that <sup>v</sup> is unconflicted, as otherwise f*<sup>y</sup>* would not be a Skolem function, so we can apply Propagate to add v to D. Repeating this process for every variable let us reach the point where <sup>Y</sup> <sup>⊆</sup> <sup>D</sup> and we can apply Sat to reach the SAT state.

### **Lemma 4.** *ID can reach the SAT state for true QBF.*

Note that proving the truth of a QBF in this way requires guessing correct Skolem functions for all existentials. In Subsect. 3.4 we discuss how termination is guaranteed with a simpler type of decisions.

$$\begin{array}{c} \text{CONFLIT} \ \frac{(\text{Ready}, C, D, \chi, \alpha)}{(\text{Conficit}(\{v, -v\}, x), C, D, \chi, \alpha)} \\\\ \text{ANALYZE} \ \frac{(\text{Conficit}(L, x), C, D, \chi, \alpha)}{(\text{Conficit}(L \otimes\_{\text{Var}(l)} c, x), C, D, \chi, \alpha)} \end{array} \\\\ \text{ALARN} \ \frac{(\text{Confilit}(L, x), C, D, \chi, \alpha)}{(\text{Confilit}(L \otimes\_{\text{Var}(l)} c, x), C, D, \chi, \alpha)} \\\\ \text{LEARN} \ \frac{(\text{Confilit}(L, x), C, D, \chi, \alpha)}{(\text{Bready}, add(C, D, L), D, \chi, \alpha)} \\\\ \text{UNAT} \ \frac{(\text{Confilit}(L, x), C, D, \chi, \alpha)}{(\text{UNSAT}, C, D, \chi, \alpha)} \qquad \text{var}(L) \subseteq D(0) \qquad x \nmid L \\\\ \text{BACKTRACK} \ \frac{(S, C, D, \chi, \alpha)}{(S, C[0, ddv], D[0, ddv], \chi, \alpha)} \qquad 0 < dldv \le |C| \\\\ \text{BACKTRACK} \ \frac{(S, C, D, \chi, \alpha)}{(S, C[0, ddv], D[0, ddv], \chi, \alpha)} \end{array}$$

### **3.2 False QBF**

To disprove false 2QBFs, i.e. formulas that do not have a Skolem function, we need the rules in Fig. 2 in addition to Propagate and Decide from Fig. 1. The conflict state can only be reached via the rule Conflict, which requires that a variable <sup>v</sup> is conflicted, i.e. unconflicted fails. The Conflict rule stores the assignment *x* to D that proves the conflict and it creates the nucleus of the learnt clause {v,¬v}. Via Analyze we can then resolve that nucleus with clauses in C(0), which consists of the original clauses and the clauses learnt so far. We are allowed to add the learnt clause back to C(0) by applying Learn.

**Invariant 2.** C(0) is equivalent to ϕ.

Note that C(0) and ϕ are propositional formulas over X ∪ Y . Their equivalence means that they have the same set of satisfying assignments. We prove Invariant 2 together with the next invariant.

**Invariant 3.** Clause <sup>L</sup> in conflict state Conflict(L, *<sup>x</sup>*) is implied by <sup>ϕ</sup>.

*Proof.* C(0) contains ϕ initially and is only ever changed by adding clauses through the Learn rule, so <sup>C</sup>(0) <sup>⇒</sup> <sup>ϕ</sup> holds throughout the computation.

We prove the other direction of Invariants 2 and 3 by mutual induction. Initially, C(0) consists exactly of the clauses ϕ, satisfying Invariant 2. The nucleus of the learnt clause v ∨ ¬v is trivially true, so it is implied by any formula, which gives us the base case of Invariant 3. Analyze is the only rule modifying L, and hence soundness of resolution together with Invariant 2 already gives us the induction step for Invariant 3 [30]. Since Learn is the only rule changing C(0), Invariant 3 implies the induction step of Invariant 2.

When adding the learnt clause to C(0) we have to make sure that Invariant 1 is preserved. Learn hence requires that we have backtracked far enough with Backtrack, such that at least one of the variables in L is not in D anymore. In this way, L may become part of future Skolem function definitions, but will first have to be checked for causing conflicts by Propagate.

If all variables in L are in D(0) and the assignment *x* from the conflict violates L, we can conclude the formula to be false using Unsat. The soundness of this step follows from the fact that *<sup>x</sup>* includes an assignment satisfying <sup>C</sup>(0)|*D*(0) (i.e. the clauses defining the Skolem functions for D(0)), Invariants 1 and 3.

#### **Lemma 5.** *ID cannot reach the UNSAT state for true QBF.*

We will now show that we can disprove any false QBF. The main difficulty in this proof is to show that from any Ready state we can learn a *new* clause, i.e. a clause that is semantically different to any clause in C(0), and then return to the Ready state. Since there are only finitely many semantically different clauses over variables X ∪Y , and we cannot terminate in any other way (Lemma 5), we eventually have to find a clause L with *var* (L) ⊆ D(0), which enables us to go to the UNSAT state.

From the Ready state, we can always add more variables to <sup>D</sup> with Decide and Propagate, until we reach a conflict. (Otherwise we would reach a state where <sup>D</sup> <sup>=</sup> <sup>Y</sup> we were able to prove SAT, contradicting Lemma 5.) We only enter <sup>a</sup> Conflict state for a variable <sup>v</sup>, if there are two clauses (c1∨v) and (c2∨¬v) with unique consequence <sup>v</sup> such that *<sup>x</sup>* <sup>|</sup><sup>=</sup> <sup>¬</sup>c<sup>1</sup> ∧ ¬c<sup>2</sup> (see definition of unconflicted).

In order to apply Analyze, we need to make sure that (c<sup>1</sup> <sup>∨</sup>v) and (c<sup>2</sup> ∨ ¬v) are in C(0). We can guarantee this by restricting Decide as follows: We say a decision for a variable v is *consistent with the unique consequences* in state (Ready, C, D, χ, α), if unconflicted(v, C.δ, χ∧α, D). We can construct such a decision easily by applying Decide only on variables that are not conflicted already (i.e. unconflicted(v, C, χ∧α, D)) and by defining <sup>δ</sup> to be the CNF representation of ¬A*<sup>v</sup>* ⇒ ¬v (i.e. require v to be false, unless a unique consequence containing literal v applies). It is clear that for this δ no new conflict for v is introduced and hence unconflicted(v, C.δ, χ <sup>∧</sup> α, D).

Assuming that all decisions are taken consistent with the unique consequences, we know that when we encounter a conflict for variable v, we did not apply Decide for <sup>v</sup>, and hence the clauses (c1∨v) and (c2∨¬v) causing the conflict must be in <sup>C</sup>(0). We can hence apply Analyze twice with clauses (c<sup>1</sup> <sup>∨</sup> <sup>v</sup>) and (c<sup>2</sup> ∨ ¬v) and obtain the learnt clause <sup>L</sup> <sup>=</sup> <sup>c</sup><sup>1</sup> <sup>∨</sup> <sup>c</sup>2. Since *<sup>x</sup>* <sup>|</sup><sup>=</sup> <sup>¬</sup>c<sup>1</sup> ∧ ¬c2, the learnt clause is violated by *<sup>x</sup>*. As *<sup>x</sup>* refutes unconflicted(v, C, χ <sup>∧</sup> α, D) by construction, it must satisfy the clauses C|*<sup>D</sup>* and learnt clause L hence cannot be in C|*D*. Further, L only contains variables that are in D, as (c<sup>1</sup> ∨ v) and (c<sup>2</sup> ∨ ¬v) were clauses with unique consequence v. So, L would have been in C|*D*, if it existed in C already, and hence L is new. We can either add the new clause to <sup>C</sup>(0) after backtracking, or we can conclude UNSAT.

# **Lemma 6.** *ID can reach the* UNSAT *state for false QBF.*

The clause learning process considered here only applies one actual resolution step per conflict (L<sup>1</sup> ⊗*<sup>v</sup>* L2). In practice, we probably want to apply multiple resolution steps before applying Learn. It is possible to use the conflicting assignment *x* to (implicitly) construct an implication graph and mimic the clause learning of SAT solvers [8,31].

### **3.3 Example**

We now discuss the application of the inference rules along the following formula:

$$\forall x\_1, x\_2. \; \exists y\_1, \dots, y\_4. \; (x\_1 \lor \neg y\_1) \; \land \; (x\_2 \lor \neg y\_1) \; \land \; (\neg x\_1 \lor \neg x\_2 \lor y\_1) \land \tag{1}$$

$$\left(\neg x\_2 \lor y\_2\right) \land \left(\neg y\_1 \lor y\_2\right) \land \left(x\_2 \lor y\_1 \lor \neg y\_2\right) \land \tag{2}$$

$$(y\_1 \lor \neg y\_3) \land (y\_2 \lor \neg y\_3) \land \tag{3}$$

$$(\neg y\_1 \lor y\_4) \land (\neg y\_3 \lor \neg y\_4) \tag{4}$$

Initially, the state of the algorithm is the tuple (Ready, ϕ, X, **<sup>1</sup>**, **<sup>1</sup>**). The rule <sup>P</sup>ropagate can be applied to <sup>y</sup><sup>1</sup> in the initial state, as we are in the Ready state, <sup>y</sup><sup>1</sup> <sup>∈</sup>/ <sup>X</sup>, and because <sup>y</sup><sup>1</sup> satisfies the checks deterministic and unconflicted: The antecedents of y<sup>1</sup> are A*y*<sup>1</sup> = x<sup>1</sup> ∧ x<sup>2</sup> and A¬*y*<sup>1</sup> = ¬x<sup>1</sup> ∨ ¬x<sup>2</sup> (see clauses in line (1)). It is easy to check that both A*y*<sup>1</sup> ∨ A¬*y*<sup>1</sup> nor ¬(A*y*<sup>1</sup> ∧ A¬*y*<sup>1</sup> ) hold for all assignments to x<sup>1</sup> and x2. The state resulting from the application of <sup>P</sup>ropagate is (Ready, ϕ, X ∪ {y1}, **<sup>1</sup>**, **<sup>1</sup>**). (Alternatively, we could apply Decide in the initial state, but deriving unique Skolem functions is generally preferable.)

While Propagate was not applicable to y<sup>2</sup> before, it now is, as the increased set <sup>D</sup> made <sup>y</sup><sup>2</sup> deterministic (see clauses in line (2)). We can thus derive the state (Ready, ϕ, X ∪ {y1, y2}, **<sup>1</sup>**, **<sup>1</sup>**).

Now, we ran out of variables to propagate and the only applicable rule is Decide. We arbitrarily choose y<sup>3</sup> as our decision variable and arbitrarily introduce a single clause δ = {(¬y<sup>1</sup> ∨ ¬y<sup>2</sup> ∨ y3)}, arriving in the state (Ready, ϕ.δ, X∪{y1, y2}, **<sup>1</sup>**, **<sup>1</sup>**). In this step we can immediately apply Propagate (consider δ and the clauses in line (3)) to add the decision variable to the set D and arrive at (Ready, ϕ.δ, X ∪ {y1, y2, y3}, **<sup>1</sup>**, **<sup>1</sup>**).

We can now apply Backtrack to undo the last decision, but this would not be productive. Instead identify y<sup>4</sup> to be conflicted and we enter a conflict state with Conflict: (Conflict({y4,¬y4}, x1∧x2), ϕ.δ, X∪{y1, y2, y3}, **<sup>1</sup>**, **<sup>1</sup>**). To resolve the conflict we apply Analyze twice - once with each of the clauses in line (4) - bringing us into state (Conflict({¬y1,¬y3}, x<sup>1</sup> <sup>∧</sup> <sup>x</sup>2), ϕ.δ, X ∪ {y1, y2, y3}, **<sup>1</sup>**, **<sup>1</sup>**). We can backtrack one level such that <sup>D</sup> <sup>=</sup> <sup>X</sup> ∪ {y1, y2} and then apply Learn to enter state (Ready, ϕ ∪ {(¬y<sup>1</sup> ∨ ¬y3)}, X ∪ {y1, y2}, **<sup>1</sup>**, **<sup>1</sup>**).

The rest is simple: we apply Propagate on y<sup>3</sup> and take a decision for y4. As no other variable can depend on y<sup>4</sup> we can take an arbitrary decision for y<sup>4</sup> that makes y<sup>4</sup> deterministic, as long as this does not make y<sup>4</sup> conflicted. Finally, we can propagate <sup>y</sup><sup>4</sup> and then apply SAT to conclude that we have found Skolem functions for all existential variables.

### **3.4 Termination**

So far, we have described sound and nondeterministic algorithms that allow us to prove or disprove any 2QBF. We can easily turn the algorithm in the proof of Lemma 6 into a *deterministic* algorithm that terminates for both true and false QBF by introducing an arbitrary ordering of variables and assignments: Whenever there is nondeterminism in the application of one of the rules as described in Lemma 6, pick the smallest variable for which one of the rules is applicable. When multiple rules are applicable for that variable, pick them in the order they appear in the figures. When the inference rule allows multiple assignments, pick the smallest. In particular, this guarantees that the existential variables are added to D in the arbitrarily picked order, as for any existential not in D we can either apply Propagate, Decide, or Conflict.

Restricting Decide to decisions that are consistent with the unique consequences may be unintuitive for true QBF, where we try to find a Skolem function. However, whenever we make the 2QBF false by introducing clauses with Decide, we will eventually go to a conflict state and learn a new clause. Deriving the learnt clause for conflicted variable v from two clauses with unique consequence v (as described for Lemma 6) means that we push the constraints


**Fig. 3.** Concepts in ID and their counterparts in CDCL

towards *smaller* variables in the variable ordering. The learnt clause will thus improve the Skolem function for a smaller variable or cause another conflict for a smaller variable. In the extreme case, we will eventually learn clauses that look like function table entries, as used in Lemma 4, i.e. clauses containing exactly one existential variable. At some point, even with our restriction for Decide, we cannot make a "wrong" decision: The cases for which a variable does not have a clause with unique consequence are either irrelevant for the satisfaction of the 2QBF or our restricted decisions happen to make the right assignment.

In cases where no static ordering of variables is used - as it will be the case in any practical approach - the termination for true QBF is less obvious but follows the same argument: Given enough learnt clauses, the relationships between the variables are dense enough such that even naive decisions suffice.

#### **3.5 Pure Literals**

The original paper on ID introduces the notion of *pure literals* for QBF that allows us to propagate a variable v even if it is not deterministic, if for a literal l of v, all clauses c that l occurs in are either satisfied or l is the unique consequence of c. The formalization presented in this section allows us to conclude that pure literals are a special case of Decide: We can introduce clauses defining v to be of polarity l whenever all clauses containing l are satisfied by another literal.

That is, we can precisely characterize the minimal set of cases in which v has to be of polarity l and the decision is guaranteed to never introduce unnecessary conflicts. The same definition cannot be made when l occurs in clauses where it is not a unique consequence, as then the clause contains another variable that is not deterministic yet.

### **3.6 Relation of ID and CDCL**

There are some obvious similarities between ID and conflict-driven clause learning (CDCL) for SAT. Both algorithms modify their partial assignments by propagation, decisions, clause learning, and backtracking. The main difference between the algorithms is that, while CDCL solvers maintain a partial assignment of Boolean values to variables, ID maintains a partial assignment of functions to variables (which is represented by the clauses C|*D*). We summarized our observations in Fig. 3.

InductiveRefinement (Conflict(L, ), C, D, χ, α) <sup>ϕ</sup>( <sup>|</sup>*X*, ) (Conflict(L, ), C, D, χ ∧ ¬ϕ( ), α) Failed (Conflict(L, ), C, D, χ, α) <sup>ϕ</sup>( <sup>|</sup>*X*) is unsatisfiable (UNSAT, C, D, χ, α)

**Fig. 4.** Inference rules adding inductive reasoning to ID

### **4 Inductive Reasoning**

The CEGIS approach to solving a 2QBF ∀X. ∃Y.ϕ is to iterate over X assignments *x* and check if there is an assignment *y* such that ϕ(*x*, *y*) is valid. Upon every successful iteration we exclude all assignments to X for which *y* is a matching assignment. If the space of X assignments is exhausted we conclude the formula is true, and if we find an assignment to X for which there is no matching Y assignment, the formula is false [21].

While this approach shows poor performance on some problems, as discussed in the introduction, it is widely popular and has been successfully applied in many cases. In this section we present a way how it can be integrated in ID in an elegant way. The simplicity of the CEGIS approach carries over to our extension of ID - we only need the two additional inference rules in Fig. 4.

We exploit the fact that ID already generates assignments *x* to X in its conflict check. Whenever ID is in a conflict state, the rules in Fig. 4 allow us to check if there is an assignment *<sup>y</sup>* to <sup>Y</sup> that together with <sup>|</sup>*X*, which is the part of *x* defining variables in X, satisfies ϕ. If there is such an assignment *y*, we can let the Skolem functions output *y* for the input *x*. But the output *y* may work for other assignments to X, too. The set of all assignments to X for which *y* works as an output, is easily characterized by ϕ(*y*).<sup>1</sup> InductiveRefinement allows us to exclude the assignments ϕ(0) from χ, which represents the domain (i.e. assignments to X) for which we still need to find a Skolem function.

This gives rise to a new invariant, stating that ¬χ only includes assignments to X for which we know that there is an assignment to Y satisfying ϕ. With this invariant it is clear that Lemma 3 also holds for arbitrary χ.

**Invariant 4.** ∀X.∃Y. ¬χ ⇒ ϕ

It is easy to check that Propagate preserves Invariant 1 also if χ and α are not **1**. Invariants 2 and 3 are unaffected by the rules in this section. To make sure that Lemma 5 is preserved as well, we thus only have to inspect Failed, which is trivially sound.

<sup>1</sup> We can actually exploit the Skolem functions that do not depend on decisions and exclude <sup>C</sup>(0)(*yD*(0)) from <sup>χ</sup> instead, i.e. the set of assignments to <sup>D</sup>(0) to which the part of *y* that is not in D(0) is a solution.

$$\begin{aligned} \text{Assembly} & \frac{(\text{Ready}, C, D, \chi, \alpha)}{(\text{Ready}, C, D, \chi, \alpha \land l)} \\\\ \text{CLOSE} & \frac{(\text{Ready}, C, D, \chi, \alpha)}{(\text{Ready}, C(0), D(0), \chi \land \neg \alpha, \mathbf{1})} \end{aligned}$$

**Fig. 5.** Inference rules adding case distinctions to ID

*A Portfolio Approach?* In principle, we could generate assignments *x* independently from the conflict check of ID. The result would be a portfolio approach that simply executes ID and CEGIS in parallel and takes the result from whichever method terminates first. The idea behind our extension is that conflict assignments are more selective and may thus increase the probability that we hit a refuting assignment to X. Also ID may profit from excluding groups of assignments for which frequently cause conflicts. We revisit this question in Sect. 6.

*Example.* We extend the example from Subsect. 3.3 from the point where we entered the conflict state (Conflict({y4,¬y4}, x<sup>1</sup> <sup>∧</sup> <sup>x</sup>2), ϕ.δ, X ∪ {y1, y2, y3}, **<sup>1</sup>**, **<sup>1</sup>**). We can apply InductiveRefinement, checking that there is indeed a solution to ϕ for the assignment x1, x<sup>2</sup> to the universals (e.g. y1, y2,¬y3, y4). Instead of doing the standard conflict analysis as in our previous example, we can apply <sup>L</sup>earn to add the (useless) clause <sup>y</sup><sup>4</sup> ∨ ¬y<sup>4</sup> to <sup>C</sup>(0) without any backtracking. That is, we effectively ignore the conflict and go to state (Ready, ϕ ∪ {(y<sup>4</sup> <sup>∨</sup> ¬y4)}.δ, X ∪ {y1, y2, y3},¬x<sup>1</sup> ∨ ¬x2, **1**).

There is no assignment to X that provokes a conflict for y4, other than the one we excluded through InductiveRefinement. We can thus take an arbitrary decision for y<sup>4</sup> that is consistent with the unique consequences (see Subsect. 3.2), Propagate y4, and then conclude the formula to be true.

# **5 Expansion**

Universal expansion (defined in Sect. 2) is another fundamental proof rule that deals with universal variables. It has been used in early QBF solvers [10] and has later been integrated in CEGAR-style QBF solvers [26,32].

One way to look at the expansion of a universal variable x is that it introduces a case distinction over the possible values of x in the Skolem functions. However, instead of creating a copy of the formula explicitly, which often caused a blowup in required memory, we can reason about the two cases sequentially. The rules in Fig. 5 extend ID by universal expansion in this spirit.

Using Assume we can, at any point, assume that a variable v in D(0), i.e. a variable that has a unique Skolem function without any decisions, has a particular value. This is represented by extending α by the corresponding literal of v, which restricts the domain of the Skolem function that we try to construct for subsequent deterministic and unconflicted checks. Invariant <sup>1</sup> and Lemma <sup>5</sup> already accommodate the case that α is not **1**.

When we reach a point where D contains all variables, we cannot apply Sat, as that requires α to be true. In this case, Invariant 1 only guarantees us that the function we constructed is correct on the domain χ ∧ α. We can hence restrict the domain for which we still need to find a Skolem function and strengthen <sup>χ</sup> by <sup>¬</sup>α. In particular, Close maintains Invariant 4. When <sup>χ</sup> ends up being equivalent to **0**, Invariant 4 guarantees that the original formula is true. (In this case we can reach the SAT state easily, as we know that from now on every application of Propagate must succeed.<sup>2</sup>)

Note that Assume does not restrict us to assumptions on single variables. Together with Decide and Propagate it is possible to introduce variables with arbitrary definitions, add them to D(0), and then assume an outcome with the rule Assume.

*Example.* Again, we consider the formula from Subsect. 3.3. Instead of the reasoning steps described in Subsect. 3.3, we start using Assume with literal x2. Whenever checking deterministic or unconflicted in the following, we will thus restrict ourselves to universal assignments that set x<sup>2</sup> to true. It is easy to check that this allows us to propagate not only y<sup>1</sup> and y2, but also y3. A decision (e.g. δ = {(y4)}) for y<sup>4</sup> allows us to also propagate y<sup>4</sup> (this time without potential for conflicts), arriving in state (Ready, ϕ.δ , X ∪ {y1, y2, y3, y4}, **1**, x2).

We can Close this case concluding that under the assumption x<sup>2</sup> we have found a Skolem function. We enter the state (Ready, ϕ, X,¬x2, **<sup>1</sup>**) which indicates that in the future we only have to consider universal assignments with ¬x2. Also for the case ¬x2, we cannot encounter conflicts for this formula. Expansion hence allows us to prove this formula without any conflicts.

# **6 Experimental Evaluation**

We extended the QBF solver CADET [8] by the extensions described in Sects. 4 and 5. We use the CADET-IR and CADET-E to denote the extensions of CADET by inductive reasoning (Sect. 4) and universal expansion (Sect. 5), respectively. We also combined both extensions and refer to this version as CADET-IR-E. The experiments in this section evaluate these extensions against the basic version of CADET and against other successful QBF solvers of the recent years, in particular GhostQ [33], RAReQS [32], Qesto [23], DepQBF [19] in version 6, and CAQE [24,26]. For every solver except CADET and GhostQ, we use Bloqqer [34] in version 031 as preprocessor. For our experiments, we used a machine with a 3.6 GHz quad-core Intel Xeon processor and 32 GB of memory. The timeout and memout were set to 600 s and 8 GB. We evaluated the

<sup>2</sup> Technically, we could replace Sat by a rule that allows us to enter the SAT state whenever <sup>χ</sup> is **0**, which arguably would be more elegant. But that would require us to introduce the Close rule already for the basic ID inference system.

**Fig. 6.** Cactus plot comparing solvers on the QBFEval-2017 2QBF benchmark.

solvers on the benchmark sets of the last competitive evaluation of QBF solvers, QBFEval-2017 [9].

*How Does Inductive Reasoning Affect the Performance?* In Fig. 6 we see that CADET-IR clearly dominates plain CADET. It also dominates all solvers that relied on clause-level CEGAR and Bloqqer (CAQE, Qesto, RAReQS).

Only GhostQ beats CADET-IR and solves 5 more formulas (of 384). A closer look revealed that there are many formulas for which CADET-IR and GhostQ show widely different runtimes hinting at potential for future improvement.

GhostQ is based on the CEGAR principle, but reconstructs a circuit representation from the clauses instead of operating on the clauses directly [33]. This makes GhostQ a representative of QBF solvers working with so called "structured" formulas (i.e. not CNF). CADET, on the other hand, refrains from identifying logic gates in CNF formulas and directly operates with the "unstructured" CNF representation. In the ongoing debate in the QBF community on the best representation of formulas for solving quantified formulas, our experimental findings can thus be interpreted as a tie between the two philosophies.

*Is the Inductive Reasoning Extension Just a Portfolio-Approach?* To settle this question, we created a version of CADET-IR, called IR-only, that exclusively applies inductive reasoning by generating assignments to the universals and applying InductiveReasoning. This version of CADET does not learn any clauses, but otherwise uses the same code as CADET-IR. On the QBFEval-2017 benchmark, IR-only and CADET together solved 235 problems within the time limit, while CADET-IR solved 243 problems. That is, even though the combined runtime of CADET and IR-only was twice the runtime of CADET-IR,

**Fig. 7.** Cactus plot comparing solver performance on the Hardware Fixpoint formulas. Some but not all of these formulas are part of QBFEval-2017. The formulas encode diameter problems that are known to be hard for classical QBF search algorithms [35].

they solved fewer problems. CADET-IR also uniquely solved 22 problems. This indicates that CADET-IR improves over the portfolio approach.

*How Does Universal Expansion Affect the Performance?* CADET-E clearly dominates plain CADET on QBFEval-2017, but compared to CADET-IR and some of the other QBF solvers, CADET-E shows mediocre performance overall. However, for some subsets of formulas, such as the Hardware Fixpoint formulas shown in Fig. 7, CADET-E dominated CADET, CADET-IR, and all other solvers. We also combined the two extensions of CADET to obtain CADET-IR-E. While this helped to improve the performance on the Hardware Fixpoint formulas even further, it did not change the overall picture on QBFEval-2017.

# **7 Conclusion**

Reasoning in quantified logics is one of the major challenges in computer-aided verification. Incremental Determinization (ID) introduced a new algorithmic principle for reasoning in 2QBF and delivered first promising results [8]. In this work, we formalized and generalized ID to improve the understanding of the algorithm and to enable future research on the topic. The presentation of the algorithm as a set of inference rules has allowed us to disentangle the design choices from the principles of the algorithm (Sect. 3). Additionally, we have explored two extensions of ID that both significantly improve the performance: The first one integrates the popular CEGAR-style algorithms and Incremental Determinization (Sect. 4). The second extension integrates a different type of reasoning termed universal expansion (Sect. 5).

**Acknowledgements.** We want to thank Martina Seidl, who brought up the idea to formalize ID as inference rules, and Vijay D'Silva, who helped with disentangling the different perspectives on the algorithm. This work was supported in part by NSF grants 1139138, 1528108, 1739816, SRC contract 2638.001, the Intel ADEPT center, and the European Research Council (ERC) Grant OSARES (No. 683300).

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **The Proof Complexity of SMT Solvers**

Robert Robere1(B), Antonina Kolokolova<sup>2</sup>, and Vijay Ganesh<sup>3</sup>

<sup>1</sup> University of Toronto, Toronto, Canada robere@cs.toronto.edu

<sup>2</sup> Memorial University of Newfoundland, St. John's, Canada

kol@mun.ca

<sup>3</sup> University of Waterloo, Waterloo, Canada vijay.ganesh@uwaterloo.ca

**Abstract.** The resolution proof system has been enormously helpful in deepening our understanding of conflict-driven clause-learning (CDCL) SAT solvers. In the interest of providing a similar proof complexitytheoretic analysis of satisfiability modulo theories (SMT) solvers, we introduce a generalization of resolution called Res(T). We show that many of the known results comparing resolution and CDCL solvers lift to the SMT setting, such as the result of Pipatsrisawat and Darwiche showing that CDCL solvers with "perfect" non-deterministic branching and an asserting clause-learning scheme can polynomially simulate general resolution. We also describe a stronger version of Res(T), Res∗(T), capturing SMT solvers allowing introduction of new literals. We analyze the theory EUF of equality with uninterpreted functions, and show that the Res∗(EUF) system is able to simulate an earlier calculus introduced by Bjørner and de Moura for the purpose of analyzing DPLL(EUF). Further, we show that Res∗(EUF) (and thus SMT algorithms with clause learning over EUF, new literal introduction rules and perfect branching) can simulate the Frege proof system, which is well-known to be far more powerful than resolution. Finally, we prove under the Exponential Time Hypothesis (ETH) that *any* reduction from EUF to SAT (such as the Ackermann reduction) must, in the worst case, produce an instance of size Ω(n log n) from an instance of size n.

# **1 Introduction**

It is common practice in formal verification literature to view SAT/SMT solver algorithms as proof systems and study their properties, such as soundness, completeness and termination, using proof-theoretic tools [GHN+04,ORC09,Tin12]. However, much work remains in applying the powerful lens of proof complexity theory in understanding the relative power of these solvers. All too often, the power of SAT and SMT (satisfiability modulo theories) solving algorithms is determined by how they perform at the annual SAT or SMTCOMP competitions [BHJ17,smt]. While such competitions are an extremely useful practical test of the power of solving methods, they do not address fundamental questions such as which heuristics are truly responsible for the power of these solvers or what are the lower bounds for these methods when viewed as proof systems.

Solvers, by their very nature, are a tangled jumble of heuristics that interact with each other in complicated ways. Many SMT solvers run into hundreds of thousands of lines of code, making them very hard to analyze. It is often difficult to discern which sets of heuristics are universally useful, which sets are tailored to a class of instances, and how their interactions actually help solver performance. A purely empirical approach, while necessary, is far from sufficient in deepening our understanding of solver algorithms. What is needed is an appropriate combination of empirical and theoretical approaches to understanding the power of solvers. Fortunately, proof complexity theory provides a powerful lens through which to mathematically analyze solver algorithms as proof systems and to understand their relative power via lower bounds. The value of using proof complexity theory to better understand solving algorithms as proof systems is three-fold: first, it allows us to identify key ingredients of a solving algorithm and prove lower bounds to non-deterministic combinations of such ingredients. That is, we can analyze the countably many variants of a solving algorithm in a unified manner via a single analysis, rather than analyzing different configurations of the same set of proof-theoretic ingredients; second, proof complexity-theoretic tools allow us to recognize the relative power of two proof systems, via appropriate lower bounds, even if both have worst-case exponential time complexity; finally, proof complexity theory brings with it a rich literature and connections to other sub-fields of complexity theory (e.g., circuit complexity) that we may be able to leverage in analyzing solver algorithms. Many proof complexity theorists and logicians have long recognized this, and there is rich literature on the analysis of SAT solving algorithms such as DPLL and conflict-driven clause-learning (CDCL) solvers [PD11,BKS04,BBJ14,AFT11]. In this paper, we lift some of these results to the setting of SMT solvers, following the work of Bjørner and de Moura [BM14].

Our focus is primarily the proof complexity-theoretic analysis of the "DPLL(T) method"<sup>1</sup>, the prime engine behind many modern SMT solvers [GHN+04,Tin12]. (While other approaches to solving first-order formulas have been studied, DPLL(T) remains a fundamental and dominant approach.) A DPLL(T)-based SMT solver takes as input a Boolean combination of first-order theory T atoms or their negation (aka, theory literals), and decides whether such an input is satisfiable. Informally, a typical DPLL(T)-based SMT solver <sup>S</sup>

<sup>1</sup> Prior to mid 2000's, SAT researchers and complexity theorists confusingly used the term DPLL to refer to both the original algorithm proposed by Davis, Putnam, Loveland, and Loeggemann in 1960, as well as the newer algorithm by Joao Marques-Silva and Karem Sakallah that added clause learning to DPLL (proposed in 1996), even though they are vastly different in power as proof systems. We will follow the literature and use DPLL(T) to indicate a "modern" SMT solver with clause learning and restarts, but, we urge SMT solver researchers to use the more appropriate term CDCL(T) rather than DPLL(T) to refer to the lazy approach to SMT.

is essentially a CDCL Boolean SAT solver that calls out a theory solver <sup>T</sup>*<sup>s</sup>* during its search to perform *theory propagations* and *theory conflict-clause learning*. The typical theory solver T*<sup>s</sup>* is designed to accept only quantifier-free conjunction of theory <sup>T</sup> literals (the <sup>T</sup> in the term DPLL(T)), while the SAT solver "handles" the Boolean structure of input formulas. Roughly speaking, the SMT solver S works as follows: First, it constructs a Boolean abstraction B*<sup>F</sup>* of the input formula F, by replacing theory literals by Boolean variables. If B*<sup>F</sup>* is UNSAT, S returns UNSAT. Otherwise, satisfying assignments to the Boolean abstraction B*<sup>F</sup>* are found, which in turn correspond to conjunctions of theory literals. Such conjunctions are then input to the theory solver T*s*, which may deduce new implied formulas (via theory propagation and conflict clause learning) that are then used to help prune the search space of assignments to F. The solver S returns SAT upon finding a satisfying theory assignment to the input F, and UNSAT otherwise. (For further details, we refer the reader to the excellent exposition on this topic by Tinelli [Tin12].)

**A Brief Description of the** Res(T) **Proof System:** To abstractly model a DPLL(T)-based SMT solver <sup>S</sup>, we define a proof system Res(T) below for a given first-order theory <sup>T</sup>. The Res in Res(T) refers to the general resolution proof system for Boolean logic. Without loss of generality, we assume that Res(T) accepts theory formulas in conjunctive normal form (CNF). Let F denote a CNF with propositional variables representing atoms from an underlying theory <sup>T</sup>, and for any clause <sup>C</sup> in F F let vars(F) denote the set of propositional atoms occurring in <sup>F</sup>. The proof rules of Res(T) augment the resolution proof rule as follows: A proof in Res(T) is a general resolution refutation of <sup>F</sup>, where at any step the theory T-solver can add to the set of clauses an arbitrary clause C such that T - <sup>C</sup> and every propositional atom in vars(C) occurs in the original formula. That is, each line of a Res(T) proof is deduced by one of the following rules:

**Resolution.** <sup>C</sup> <sup>∨</sup> -, D <sup>∨</sup> - <sup>C</sup> <sup>∨</sup> <sup>D</sup>, for previously derived clauses <sup>C</sup> and <sup>D</sup>. **Theory Derivation.** <sup>C</sup> for any clause <sup>C</sup> such that <sup>T</sup> - C and for which every theory literal in C occurs in the input formula.

For example, a theory of linear arithmetic may introduce a clause (<sup>x</sup> <sup>≥</sup> <sup>5</sup>∨<sup>y</sup> <sup>≥</sup> <sup>7</sup> <sup>∨</sup> <sup>x</sup> <sup>+</sup> y < 12), which can then be used in the subsequent steps of a resolution proof, provided each of those literals occurred in the original CNF formula F. The filter on the theory rule of Res(T) models the fact that in many modern SMT solvers, the "theory solver" is only allowed to reason about literals which already occur in the formula. Recent solvers such as Z3, Yices [Z3,Yic] break this rule and allow the theory solver to introduce new propositional atoms; to model this we introduce the stronger variant Res<sup>∗</sup>(T) with a strengthened theory rule:

**Strong Theory Derivation:** <sup>C</sup> for any clause <sup>C</sup> such that <sup>T</sup> -C.

### **1.1 Our Contributions**

We prove the following results about the two systems Res(T), Res∗(T) and the complexity of SMT solving.


These results seem to suggest that our generalization is the "right" proof system corresponding to DPLL(T), as it characterizes proofs produced by DPLL(T) and it can simulate other proof systems introduced in the literature to capture DPLL(T) for particular theories <sup>T</sup>.

### **1.2 Previous Work**

Among the previous proof systems combining resolution with non-propositional reasoning are R(CP) proof system of [Kra98], where propositional variables are replaced with linear inequalities, and R(lin) introduced by Raz and Tzameret [RT08], which reasons with linear equalities, modifying the resolution rule. R(lin) polynomially simulates R(CP) when all coefficients in an R(CP) proof are polynomially bounded. In the SMT community, Bjørner et al. [BDdM08,BM14] introduced calculi capturing the power of resolution over the theory of equality and equality with uninterpreted functions. They show that these systems capture the power of resolution over the corresponding theories, extended with rules for introducing new atoms. Our results supersede previous work since our simulations hold for any first-order theory T.

### **2 Preliminaries**

### **2.1 Propositional Proof Systems**

In this paper, all proof systems are defined by a set of "allowed lines" equipped with a list of deduction rules that allow us to deduce new lines from old ones. We first recall the *resolution* system, which is a refutation system for propositional formulas in CNF (product of sums) form. The lines of a resolution proof are disjunctions of boolean literals called *clauses*, and these lines are equipped with a single deduction rule called the *resolution rule*: given two clauses of the form <sup>C</sup> <sup>∨</sup> -, <sup>D</sup> <sup>∨</sup> we deduce the clause <sup>C</sup> <sup>∨</sup> <sup>D</sup>. If <sup>φ</sup> <sup>=</sup> <sup>C</sup><sup>1</sup> <sup>∧</sup> <sup>C</sup><sup>2</sup> ∧··· ∧ <sup>C</sup>*<sup>m</sup>* is an unsatisfiable CNF formula then a resolution refutation of φ is a sequence of clauses C1, C2,...,C*m*, C*m*+1,...,C*<sup>t</sup>* where C*<sup>t</sup>* is the empty clause and all clauses C*<sup>i</sup>* with i>m are deduced from earlier clauses by applying the resolution rule.

Observe that clauses satisfy a *subsumption principle*: if C, D are clauses such that <sup>C</sup> <sup>⊆</sup> <sup>D</sup> then every assignment satisfying <sup>C</sup> also satisfies <sup>D</sup>. This implies that we can safely add a *weakening rule* to resolution which, from a clause <sup>C</sup>, derives the clause <sup>C</sup> <sup>∨</sup> <sup>x</sup> for any literal <sup>x</sup> not already occurring in <sup>C</sup>. The subsumption principle implies that this weakening rule does not change the power of resolution, as any use of a clause <sup>D</sup> <sup>⊇</sup> <sup>C</sup> can be eliminated or replaced with C.

We also consider the *Frege* proof system, which captures standard "textbookstyle" proofs. The lines of a Frege system are given by arbitrary boolean formulas, and from two boolean formulas we can deduce any new boolean formula which follows under typical boolean reasoning (e.g. deducing the conjunction of two formulas, the disjunction of their negation, and so on). Crucially, Frege proofs allow applying a generalized "resolution rule" to arbitrary polynomial-size formulas.

The power of different propositional proof systems are compared using the notion of an *polynomial simulation (p-simulation)*. Proof system A *polynomially simulates* (or p-simulates) proof system B if, for every unsatisfiable formula F, the shortest refutation proof of F in A is at most polynomially longer than the shortest refutation proof of a formula F in B. For example, the Frege proof system p-simulates the Resolution proof system, but the converse is widely conjectured not to hold.

#### **2.2 First-Order Theories**

In this paper we study proof systems for first-order theories. For the sake of completeness we recall some relevant definitions from first-order logic, but remark that this is essentially standard fare.

Let L be a first-order signature (a list of constant symbols, function symbols, and predicate symbols). Given a set of <sup>L</sup>-sentences <sup>A</sup> and an <sup>L</sup>-sentence <sup>B</sup> we write A - <sup>B</sup> if every model of <sup>A</sup> is also a model of <sup>B</sup>. A *first order theory* (or simply a *theory*) is a set of L-sentences that is consistent (that is, it has a model) and is closed under -. The *decision problem* for a theory T is the following: given a set <sup>S</sup> of literals over <sup>L</sup>, decide if there is a model <sup>M</sup> of <sup>T</sup> such that <sup>M</sup> -S. The *satisfiability problem* for T, also denoted T-SAT, is the following: given a quantifier-free formula <sup>F</sup> in <sup>T</sup> in conjunctive normal form (CNF), decide if there is a model <sup>M</sup> of <sup>T</sup> such that <sup>M</sup> -F.

A simple example of a theory is E, the conjunctive theory of equality. The signature of E contains a single predicate symbol = and an infinite list of constant symbols. It is axiomatized by the standard axioms of equality (reflexivity, symmetry, and transitivity), and a sample sentence in E would be the formula <sup>a</sup> <sup>=</sup> <sup>b</sup> <sup>∨</sup> <sup>b</sup> <sup>=</sup> <sup>c</sup> <sup>∨</sup> <sup>a</sup> <sup>=</sup> <sup>c</sup>, which encodes the transitivity of equality between the constant symbols a, b, and c. Following the SMT literature, we will call terms from the theory (such as a and b) *theory variables*, and the atoms derived from these terms (such as <sup>a</sup> <sup>=</sup> <sup>b</sup> or <sup>a</sup> <sup>=</sup> <sup>c</sup>) will be called *theory literals* or just *literals*. We note that the decision problem for E can be decided very efficiently [DST80]; in contrast, the satisfiability problem for E is easily seen to be NP-complete.

# **3 Res(***T* **): Resolution Modulo Theories**

We now define a generalization of resolution which captures the type of reasoning modulo a first-order theory that is common in SMT solvers. We give two variants: the first, denoted Res(T), allows the deduction of any clause <sup>C</sup> of theory literals such that T - C and for which every literal in C already occurs in the input formula. This is intended to model "standard" lazy SMT solvers [NOT06] which only reason about literals in the input formula.

The second, more powerful variant is denoted Res<sup>∗</sup>(T), and allows the deduction of any clause of literals C such that T - C, *even if* the new clause contains literals which do not occur in the input formula. We introduce this to explore the power of lazy SMT solvers that are allowed to introduce new literals from the theory, and note that there are well-known examples in the SMT literature which show that introducing new literals can drastically decrease the length of refutations (e.g. the *diamond equalities* [BDdM08]). Indeed, in Sect. 5.2 we show that this power can drastically increase the proof theoretic strength of SMT solvers.

**Definition 1 (**Res(T), Res<sup>∗</sup>(T)**).** *Let* <sup>T</sup> *be a theory and let* <sup>F</sup> *be an quantifierfree CNF formula over* <sup>T</sup>*. The lines of a* Res(T) *(*Res<sup>∗</sup>(T)*) proof are quantifierfree clauses of theory literals deduced from* <sup>F</sup> *and* <sup>T</sup> *by the following derivation rules.*

**Resolution.** <sup>C</sup> <sup>∨</sup> -, D <sup>∨</sup> -<sup>C</sup> <sup>∨</sup> <sup>D</sup>*.*

**Weakening.** <sup>C</sup> <sup>C</sup> <sup>∨</sup> *for any theory literal occurring in the input formula.*

**Theory Derivation (**Res(T)**).** <sup>C</sup> *for any clause* <sup>C</sup> *satisfying* <sup>T</sup> - C *and for which every literal in* C *occurs in the input formula.*

**Strong Theory Derivation (**Res<sup>∗</sup>(T)**).** <sup>C</sup> *for any clause* <sup>C</sup> *satisfying* <sup>T</sup> - C*. A* refutation *of* F *is a proof in which the final line is the empty clause.*

It is easy to see that both Res(T) and Res∗(T) are sound since all rules are sound, and completeness follows from a straightforward modification of the usual proof of resolution completeness (see, e.g. Jukna [Juk12]).

Technically speaking, Res(T) is *not* a (formal) propositional proof system as defined by Cook and Reckhow [CR79] since the proofs may not be efficiently verifiable if deductions from the theory T are computationally difficult to verify. However, all theories considered in this paper (cf. Sect. 5) are very efficiently decidable, and thus the corresponding Res(T) proofs are efficiently verifiable.

Note that the clauses introduced by the theory derivations are arbitrary theorems of T; this means there is no direct information exchange between the resolution proof and the theory. It is enough to derive clauses in the theory derivation rules rather than arbitrary formulas since every axiom can be written in CNF form, and introduced as a sequence of clauses. The strong theory derivation rule can introduce new theory literals which might not have been present in the initial formula—we emphasize that the new theory literals can even contain theory *variables* (i.e. first-order terms) that did not occur in the original formula. We will see that this ability to introduce new literals seems to give Res<sup>∗</sup>(T) extra power over general resolution.

# **4 Lazy SMT Solvers and Res(***T* **)**

In this section we show that lazy SMT solvers and resolution modulo theories are polynomially-equivalent as proof systems, provided that the SMT solvers are given a set of branching and restart decisions *a priori*.

We model SMT solvers by the algorithm schema<sup>2</sup> DPLL(T), which is given in Algorithm 1. Using this schema we prove two results: first, if the theory solver in DPLL(T) can only reason about literals occurring in its input formula, then DPLL(T) is polynomially equivalent to Res(T). Second, if the theory solver is strengthened so that it is allowed to introduce new literals then the resulting solver can polynomially simulate Res<sup>∗</sup>(T). The proofs of these results use techniques developed for comparing Boolean CDCL solvers and resolution by Pipatsrisawat and Darwiche [PD11].

<sup>2</sup> In the literature, SMT solvers are typically defined as abstract state-transition systems (see, for instance, [GHN+04,BM14]); we have chosen to define it instead as an algorithm schema (cf. Algorithm 1) inspired by the abstract definition of a CDCL solver by Pipatsrisawat and Darwiche [PD11].


If T is a theory and A, B are formulas over T then we write A -*<sup>T</sup>* B as a shorthand for <sup>T</sup> ∪ {A} - B (i.e. every model of the theory T that satisfies A also satisfies B). We also define *unit resolution*, which describes the action of the *unit propagator*.

**Definition 2 (Unit Resolution).** *Let* F *be a collection of clauses over an arbitrary theory* <sup>T</sup>*. A clause* <sup>C</sup> *is derivable from* <sup>F</sup> *by* unit resolution *if there exists a resolution proof from* <sup>F</sup> *of* <sup>C</sup> *such that in each application of the resolution rule, one of the clauses is a unit clause. If* <sup>C</sup> *is derivable from* <sup>F</sup> *by unit resolution then we write* F <sup>1</sup> <sup>C</sup>*. If* F <sup>1</sup> <sup>∅</sup> *then we say* <sup>F</sup> *is* unit refutable*, otherwise it is* unit consistent*.*

<sup>A</sup> DPLL(T) algorithm is defined by specifying algorithms for each of the bolded "schemes" in Algorithm 1:

**Clause Learning Scheme.** When a clause in the database is falsified by the current partial assignment, the **Clause Learning Scheme** is applied to learn a new clause C which is added to the database of stored clauses.

**Restart Scheme.** The solver applies the **Restart Scheme** to decide whether or not to restart its search, discarding the current partial assignment σ and saving the list of learned clauses.

**Branching Scheme.** The **Branching Scheme** is applied to choose an unassigned variable from the formula <sup>F</sup> or from the learned clauses <sup>Γ</sup> and assign the variable a Boolean value.

<sup>T</sup>**-Propagate Scheme.** During search, the DPLL(T) solver can hand the theory solver the current partial assignment σ and ask whether or not it should unitpropagate a literal; if a unit propagation is possible the theory solver will return a clause C from the theory witnessing this unit propagation.

T**-Conflict Scheme.** When the theory solver detects that the current partial assignment σ contradicts the theory, the T**-Conflict Scheme** is applied to learn a new clause of literals <sup>C</sup>, <sup>¬</sup><sup>C</sup> <sup>⊆</sup> <sup>σ</sup>, which is added to the clause database.

We pay particular interest to the specification of the T-propagate scheme. The next definition describes two types of propagation schemes: a *weak* propagation scheme is only allowed to return clauses which propagate literals in the formula, while the more powerful *strong* propagation scheme returns a clause of literals from the theory that may contain new literals.

**Definition 3.** *A* weak T-propagate scheme *is an algorithm which takes as input a conjunction of theory literals* σ *over* T *and returns (if possible) a clause* C = <sup>¬</sup><sup>σ</sup> <sup>∨</sup> *where* T - C *and the literal occurs in the input formula of the* DPLL(T) *algorithm.*

*A* strong T-propagate scheme *is an algorithm which takes as input a conjunction of literals* σ *over* T*, and if possible returns a clause* C *of literals from* T *such that* T - <sup>C</sup> *and* <sup>¬</sup><sup>σ</sup> <sup>⊆</sup> <sup>C</sup>*. An algorithm equipped with a strong* <sup>T</sup>*-propagate scheme will be called a* DPLL<sup>∗</sup>(T) *solver.*

<sup>A</sup> DPLL(T) algorithm equipped with a weak <sup>T</sup>-propagation scheme is equivalent to the basic theory propagation rules found in SMT solvers (see, for example, [BM14,NOT06]). For technical convenience we assume that the weak T-propagate scheme adds a clause to the database "certifying" the unit propagation, while in actual implementations the clause would likely not be added and the literal would simply be propagated. Recent SMT solvers [Yic,Z3] have strengthened the interaction between the SAT solver and the theory solver, allowing the theory solver to return constraints over new variables; this is modelled very generally by strong T-propagate schemes.

### **4.1 DPLL(***T* **) and Res(***T* **)**

We now prove the main result of this section, after introducing some preliminaries from [PD11] that are suitably modified for our setting. Fix a theory T. An *assignment trail* is a sequence of pairs <sup>σ</sup> <sup>=</sup> {(*<sup>i</sup>*, d*i*)} *t <sup>i</sup>*=1 where each literal *<sup>i</sup>* is a literal from the theory and each <sup>d</sup>*<sup>i</sup>* ∈ {d, <sup>p</sup>}, indicating that the literal was set by a decision or a unit propagation. The *decision level* of a literal *<sup>i</sup>* in σ is the number of decision literals occurring in σ up to and including *<sup>i</sup>*. Given an assignment trail σ and a clause C we say that C is *asserting* if it contains exactly one literal occurring in σ at the highest decision level. A clause learning scheme is *asserting* if all conflict clauses produced by the scheme are asserting with respect to the assignment trail at the time of conflict.

An *extended branching sequence* is an ordered sequence <sup>B</sup> <sup>=</sup> {β1, β2,...,β*t*} where each <sup>β</sup>*<sup>i</sup>* is either (1) a literal from the theory, (2) a symbol <sup>x</sup> ∈ {R, NR}, to denote a restart or no-restart, respectively, or (3) a clause C such that T - C. Intuitively, extended branching sequences are used to provide a DPLL(T) solver with a list of instructions for how to proceed in its execution. For instance, whenever the solver calls the Branching Scheme, we consume the next β*<sup>i</sup>* from the sequence, and if it is a literal from the theory then the solver assigns that literal. Similarly, when the DPLL(T) solver calls the Restart Scheme it uses the branching sequence to dictate whether or not to restart, and when the solver calls the T-propagate scheme it uses the sequence to dictate which clause to learn. If the symbol does not correctly match the current scheme being called then the solver halts in error, and if the branching sequence is empty, then the algorithm proceeds using the heuristics defined by the algorithm.

We now introduce *absorbed* clauses (and their duals, *empowering* clauses), which were originally defined by Pipatsrisawat and Darwiche [PD11] and independently by Atserias et al. [AFT11]. One should think of the absorbed clauses as being learned "implicitly"—they may not necessarily appear in F, but, if we assign all but one of the literals in the clause to false then unit propagation in DPLL(T) will set the final literal to true.

**Definition 4 (Empowering Clauses).** *Let* F *be a collection of clauses over an arbitrary theory* <sup>T</sup> *and let* <sup>A</sup> *be a* DPLL(T) *solver. Let* <sup>α</sup> *be a conjunction of literals, and let* <sup>C</sup> = (¬<sup>α</sup> <sup>⇒</sup> -) *be a clause. We say that* C *is* empowering with respect to <sup>F</sup> at *if the following holds: (1)* F ∪ <sup>T</sup> - <sup>C</sup>*, (2)* F ∧ <sup>α</sup> *is unit consistent, and (3) any execution of* <sup>A</sup> *on* <sup>F</sup> *that satisfies* <sup>α</sup> *without setting* - *does not unit-propagate* -*. The literal is said to be* empowering*. If item (1), (2) are satisfied but (3) is false then we say that the solver* <sup>A</sup> *and* <sup>F</sup> absorbs <sup>C</sup> *at* -*; if* <sup>A</sup> *and* <sup>F</sup> *absorbs* <sup>C</sup> *at at every literal then the clause is simply* absorbed*.*

For an example, consider the set of clauses (<sup>x</sup> <sup>∨</sup> <sup>y</sup> <sup>∨</sup> <sup>z</sup>),(¬<sup>z</sup> <sup>∨</sup> <sup>a</sup>),(¬<sup>a</sup> <sup>∨</sup> <sup>b</sup>). The clause (<sup>x</sup> <sup>∨</sup> <sup>y</sup> <sup>∨</sup> <sup>b</sup>) is absorbed by this set of clauses as, for instance, if we falsify x and y then the unit-propagator will force b to be set to true. Thus in the DPLL(T) algorithm the unit propagator will behave as though this clause is learned even though it is not (if we remove the final clause <sup>¬</sup>a∨b, then (x∨y∨b) is empowering but not absorbed).

The next lemma shows that for any theory clause C, there is an extended branching sequence which can be applied to absorb that clause.

**Lemma 5.** *Let* <sup>F</sup> *be an unsatisfiable CNF over a theory* <sup>T</sup> *and let* <sup>Π</sup> *be any* Res(T) *proof from* <sup>F</sup>*. Let* <sup>Π</sup>*<sup>T</sup>* <sup>⊆</sup> <sup>Π</sup> *be the set of clauses in* <sup>Π</sup> *derived using the theory rule. For any* DPLL(T) *algorithm* <sup>A</sup> *there is an extended branching sequence* B *such that after applying* B *to the solver* A *every clause in* Π*<sup>T</sup> will be absorbed.*

*Proof.* Order Π*<sup>T</sup>* arbitrarily as C1, C2,...,C*<sup>t</sup>* and remove any clause that is absorbed or already in F, as these are clearly already absorbed. We construct B directly: add the negations of literals in C<sup>1</sup> to B until one literal remains, and then add the clause C<sup>1</sup> to the extended branching sequence. By definition the weak T-propagator will be called and will return C1, adding it to the clause database. Restart and continue to the next theory clause in order.

Our proof of mutual simulations between Res(T) and DPLL(T) crucially relies on the following technical lemma (which is a modified version of a lemma from [PD11]).

**Lemma 6.** *Let* F *be an unsatisfiable, unit-consistent CNF over literals from a theory* <sup>T</sup> *and let* <sup>Π</sup> *be any* Res(T) *proof from* <sup>F</sup>*. Let* <sup>Π</sup>*<sup>T</sup> be the set of clauses in* Π *derived using the theory rule. Then there exists a clause* C *in* Π *that is both empowering and unit-refutable with respect to* F ∪ <sup>Π</sup>*<sup>T</sup> .*

*Proof.* Let <sup>Π</sup> denote a Res(T)-refutation of <sup>F</sup> and assume without loss of generality (by applying Lemma 5) that the first derived clauses in Π are in Π*<sup>T</sup>* . If every clause in <sup>Π</sup> is unit-refutable from <sup>F</sup>, then the empty clause is unitrefutable and thus F is not unit-consistent, which is a contradiction. So, assume that there exists a clause C*<sup>i</sup>* which is the first clause in Π by this ordering such that it is not unit-refutable. Since <sup>Π</sup> is a Res(T)-proof, <sup>C</sup>*<sup>i</sup>* is one of three types: either it is a clause in <sup>F</sup>, it is a clause derived from the theory rule, or <sup>C</sup>*<sup>i</sup>* was derived by applying the resolution rule to two clauses <sup>C</sup>*<sup>j</sup>* , C*k*. If <sup>C</sup>*<sup>i</sup>* ∈ F then it is clearly unit-refutable, which is a contradiction. If C*<sup>i</sup>* was derived from the theory rule then it is unit-refutable with respect to Π*<sup>T</sup>* , which is again a contradiction. Finally, suppose that C*<sup>i</sup>* was derived by applying the resolution rule to clauses <sup>C</sup>*<sup>j</sup>* and <sup>C</sup>*k*, and write <sup>C</sup>*<sup>j</sup>* = (<sup>α</sup> <sup>⇒</sup> -), C*<sup>k</sup>* = (<sup>β</sup> <sup>⇒</sup> -) where is the resolved literal and j, k < i in the ordering of clauses in Π. Since C*<sup>j</sup>* and C*<sup>k</sup>* are both unit-refutable, assume by way of contradiction that neither C*<sup>j</sup>* nor C*<sup>k</sup>* are empowering. It follows by definition that both clauses are absorbed at every literal. Thus, if we consider F ∧α∧β, it follows by the absorption property that F ∧ <sup>α</sup> <sup>∧</sup> <sup>β</sup> <sup>1</sup> -, F ∧ <sup>α</sup> <sup>∧</sup> <sup>β</sup> <sup>1</sup> <sup>¬</sup> which implies that F ∧ <sup>α</sup> <sup>∧</sup> <sup>β</sup> *<sup>T</sup>* <sup>1</sup> ∅. However, <sup>C</sup>*<sup>i</sup>* <sup>=</sup> <sup>α</sup> <sup>∧</sup> <sup>β</sup>, and thus we have concluded <sup>C</sup>*<sup>i</sup>* is unit-refutable, which is a contradiction! Thus at least one of C*<sup>j</sup>* or C*<sup>k</sup>* is both empowering and unit-refutable.

The gist of the Lemma <sup>6</sup> is simple: if clauses <sup>C</sup>∨ and <sup>D</sup>∨ are both absorbed by a collection of clauses <sup>C</sup>, then asserting <sup>C</sup> <sup>∧</sup> <sup>D</sup> in the DPLL solver will hit a conflict since it will unit-imply both and -. In the main theorem, proved next, we show that empowering and unit-refutable clauses will be absorbed by the solver after sufficiently many restarts.

**Theorem 7.** *The* DPLL(T) *system with an asserting clause learning scheme, non-deterministic branching and* <sup>T</sup>*-propagation polynomially simulates* Res(T)*. Equivalently: for any unsatisfiable CNF* <sup>F</sup> *over a theory* <sup>T</sup>*, and any* Res(T) *refutation* <sup>Π</sup> *of* <sup>F</sup> *there exists an extended branching sequence* <sup>B</sup> *such that running <sup>a</sup>* DPLL(T) *algorithm on input* <sup>F</sup> *using* <sup>B</sup> *will refute* <sup>F</sup> *in time polynomial in the length of* <sup>|</sup>Π|*.*

*Proof.* Let <sup>F</sup> be an unsatisfiable CNF over the theory <sup>T</sup>, and let <sup>Π</sup> be a Res(T) refutation of <sup>F</sup>. Let <sup>Π</sup>*<sup>T</sup>* <sup>⊆</sup> <sup>Π</sup> be the set of clauses in <sup>Π</sup> derived using the theory rule, and write Π = C1, C2,...,C*m*. As a first step, apply Lemma 5 and construct an extended branching sequence B which leads to the absorbtion of all clauses in Π*<sup>T</sup>* . We prove the following claim, from which the theorem directly follows.

**Claim.** Let <sup>C</sup> be any unit-refutable and empowering clause with respect to <sup>F</sup>. Then there exists an extended branching sequence B of polynomial size such that after applying B the clause C will be absorbed.

Let be any empowering literal of <sup>C</sup>, and write <sup>C</sup> = (<sup>α</sup> <sup>⇒</sup> -). Let B be any extended branching sequence in which all literals in α are assigned. Since <sup>C</sup> is empowering, it follows that F ∧ <sup>α</sup> is unit-consistent. Extending <sup>B</sup> with the decision literal <sup>¬</sup> will therefore cause a conflict since C is unit-refutable. Let C be the asserting clause obtained by applying the clause learning scheme to <sup>B</sup> ∪ {¬-}. If F ∧C absorbs <sup>C</sup> at -, then we are done and we continue to the next empowering literal. Otherwise, we resolve whatever conflicts the solver needs to resolve (possibly adding more learned clauses along the way) until the branching sequence is unit-consistent.

Observe that after this process we must have that F ∧ <sup>C</sup> <sup>1</sup> where - is some literal at the same decision level as -, since the clause learning scheme is asserting. Thus the number of literals at the maximum decision level has reduced by one. At this point, we restart and do exactly the same sequence of branchings—each time, as argued above, we reduce the number of literals at the maximum decision level by 1. Since is a literal at the maximum decision level, it implies that after at most O(n) restarts (and O(n<sup>2</sup>) learned clauses) we will have absorbed the clause C at -. Repeating this process at most n times for each empowering literal in C we can absorb C, and it is clear that the number of learned clauses is polynomial from the analysis.

We are now ready to finish the proof. Apply the claim repeatedly to the first empowering and unit-refutable clause in Π to absorb that clause—by Lemma 6, such a clause will exist as long as the CNF <sup>F</sup> is not unit-refutable; a DPLL(T) solver can obtain an arbitrary theory clause by setting relevant literals in the branching sequence and using theory propagation. Since the length of the proof Π is finite (length m), it follows that this process must terminate after at most m iterations. At this point, there can not be such an empowering and unit-refutable clause, and so by Lemma 6 it follows that F (with its learned clauses) is now unit-refutable, and so the DPLL(T) algorithm halts and outputs UNSAT.

The reverse direction of the theorem is straightforward, and thus we have the following corollary:

**Corollary 8.** *The* DPLL(T) *system with an asserting clause learning scheme, non-deterministic branching and* T*-propagation is polynomially equivalent to* Res(T)*.*

A key point of the above simulation is that it does not depend on whether or not the T-propagation scheme is weak or strong—since the clauses learned by the scheme are specified in advance by the extended branching sequence the same proof will apply if we began with a Res∗(T) proof instead. Of course, if we begin with a Res∗(T) proof instead of a Res(T) proof we may use the full power of the theory derivation rule, requiring that we use a DPLL∗(T) algorithm with a strong T-propagation scheme instead. We record this observation as a second theorem.

**Theorem 9.** *The* DPLL∗(T) *system with an asserting clause learning scheme, non-deterministic branching and* T*-propagation is polynomially equivalent to* Res<sup>∗</sup>(T)*.*

# **5 Case Studies: Resolution Modulo Common Theories**

In this section, we study the power of Res(T) over theories that are common in the SMT context—namely, we focus on the theory of equality E, the theory of uninterpreted function symbols EUF, and the theory of linear arithmetic LA.

### **5.1 Resolution over E: A Theory of Equality**

We first consider E, the theory of equality. Bjørner et al. [BDdM08] introduced a proof-theoretic calculus called SP(E) for reasoning over the theory of equality in a prototype of our main result, they showed that proofs in SP(E) exactly characterized proofs produced by a simple model SMT solver. In this section we show that the theory Res<sup>∗</sup>(E) is polynomially-equivalent to SP(E), which is evidence that our general framework is the correct way of capturing the power of SMT solvers.

Let us first reproduce the rules of SP(E) from [BDdM08]: **Cut.** <sup>C</sup>∨-, <sup>D</sup>∨¬- <sup>C</sup> <sup>∨</sup> <sup>D</sup>, **E-Dis.** <sup>C</sup> <sup>∨</sup> <sup>a</sup> <sup>=</sup> <sup>a</sup> <sup>C</sup>, **E-Eqs.** <sup>C</sup> <sup>∨</sup> <sup>a</sup> <sup>=</sup> <sup>b</sup> <sup>∨</sup> <sup>a</sup> <sup>=</sup> <sup>c</sup> <sup>C</sup> <sup>∨</sup> <sup>a</sup> <sup>=</sup> <sup>b</sup> <sup>∨</sup> <sup>b</sup> <sup>=</sup> <sup>c</sup>, **Sup.** <sup>C</sup> <sup>∨</sup> <sup>a</sup> <sup>=</sup> <sup>b</sup>, <sup>D</sup>[a] <sup>C</sup> <sup>∨</sup> <sup>D</sup>[b]. Observe that the Sup rule allows replacing some occurrences of a term a in atoms of a clause D with b (not necessarily for all occurrences of a). Both the Sup rule and the E-Eqs rule can introduce literals that did not occur in the initial formula.

# **Proposition 10.** Res<sup>∗</sup>(E) *and* SP(E) *are polynomially equivalent.*

*Proof (Sketch).* Bjørner et al. show that SP(E) exactly characterizes the proofs produced by a simple theoretical model of an SMT solver, which we will denote by DPLL(<sup>e</sup> <sup>+</sup> <sup>Δ</sup>) [BDdM08, Theorem 4.1]. Examining the solver DPLL(<sup>e</sup> <sup>+</sup> <sup>Δ</sup>) from [BDdM08], it is not hard to see it is equivalent to the algorithm DPLL<sup>∗</sup>(E) (that is, DPLL(T) with a strong <sup>T</sup>-propagation rule). The equivalence between Res<sup>∗</sup>(E) and DPLL<sup>∗</sup>(E) follows by the Corollary of Theorem 9.

In the conclusion of [BDdM08] it is stated that there are no short SP(E) proofs of the following encoding of the pigeonhole principle (PHP): there are clauses of the form (d*<sup>i</sup>* <sup>=</sup> <sup>r</sup><sup>1</sup> <sup>∨</sup> ...d*<sup>i</sup>* <sup>=</sup> <sup>r</sup>*n*), for <sup>i</sup> <sup>∈</sup> [1, n + 1], enforcing that the <sup>i</sup>th pigeon must travel to some hole, and clauses of the form (d*<sup>i</sup>* <sup>=</sup> <sup>d</sup>*<sup>j</sup>* ) for i, j <sup>∈</sup> [1, n + 1] which, when combined with the first family of clauses and the transitivity axioms of E, imply that no two pigeons can travel to the same hole. Since their SP(E) system is equivalent to Res∗(E) it follows that the lower bounds on SP(E) carry over:

**Corollary 11.** *If* SP(E) *does not have polynomial-size refutations of the pigeonhole principle, then neither does* Res∗(E)*.*

### **5.2 Resolution over EUF: Equality with Uninterpreted Functions**

Next, we study the theory EUF, which is an extension of the theory of equality to contain uninterpreted function symbols. The signature of EUF consists of an unlimited set of uninterpreted function symbols and constant symbols; a term in the theory is thus inductively defined as either a constant symbol or an application of a function symbol to a sequence of terms: f(t1,...,t*k*). There is one relational symbol = interpreted as equality between terms, so theory literals of EUF are of the form t = t for terms t, t .

The axioms of EUF state that = is an equivalence relation, together with a family of *congruence axioms* for the function symbols stating, for any kary function symbol f and any sequences of terms t1, t2,...,t*k*, t 1, t 2,...,t *k*, if t<sup>1</sup> = t <sup>1</sup>,...,t*<sup>k</sup>* = t *<sup>k</sup>*, then <sup>f</sup>(t1,...,t*k*) = <sup>f</sup>(<sup>t</sup> 1,...,t *<sup>k</sup>*). The decision problem for EUF can be decided in time O(n log n) by the Downey-Sethi-Tarjan congruence closure algorithm [DST80].

Using EUF as a central example, Bjorner and de Moura [BM14] observed that DPLL(T) suffers some serious limitations in terms of access to the underlying theory. To resolve this, they modified DPLL(EUF) with a set of non-deterministic rules that allowed it to dynamically introduce clauses corresponding to the congruence and transitivity axioms. To characterize the strength of this new algorithm, they introduced a variant of resolution called E-Res, extending SP(E) from [BDdM08] to reasoning over uninterpreted functions. We show that the Res<sup>∗</sup>(EUF) proof system can polynomially-simulate the E-Res system, which again suggests that we have the "correct" proof system for capturing SMT reasoning. Due to space considerations, we leave the proof to the full version of the paper.

# **Theorem 12.** *The system* E*-*Res *is polynomially simulated by* Res<sup>∗</sup>(EUF)*.*

However, unlike the case of SP(E) the converse direction is not so clear. The theory rule in Res<sup>∗</sup>(EUF) is fundamentally *semantic*: it allows one to derive *any* clause which follows from the theory EUF semantically; this is in contrast to the E-Res system which is fundamentally syntactic. Thus, to show that E-Res polynomially simulates EUF, one would need to show that any use of the theory rule in a Res<sup>∗</sup>(EUF) proof could be somehow replaced with a short proof in E-Res. We leave this as an open problem.

Next, we show that Res<sup>∗</sup>(EUF) and <sup>E</sup>-Res can efficiently simulate the Frege proof system, which is a very powerful propositional proof system studied in proof complexity. We note that the simulation crucially relies on the introduction of new theory literals; this suggests that an SMT solver which can intelligently introduce new theory literals has the potential to be extremely powerful.

**Theorem 13.** Res∗(EUF) *(and, in fact,* <sup>E</sup>*-*Res*) can efficiently simulate the Frege proof system.*

**Proof Sketch.** We show the stronger statement that E-Res simulates Frege. The idea of the proof is to introduce constants <sup>e</sup><sup>0</sup> <sup>=</sup> <sup>e</sup><sup>1</sup> corresponding to False and True; every positive literal <sup>x</sup> in the original formula is replaced by <sup>x</sup> <sup>=</sup> <sup>e</sup>1, and negative literal <sup>¬</sup><sup>x</sup> by <sup>x</sup> <sup>=</sup> <sup>e</sup>0. Then introduce uninterpreted function symbols N, O, A, together with constraints that make N, O, A behave as NOT, OR and AND, respectively (such as <sup>N</sup>(e0) = <sup>e</sup><sup>1</sup> <sup>∧</sup> <sup>N</sup>(e1) = <sup>e</sup>0). So formulas in the Frege refutation are iteratively transformed into expressions of the form t*<sup>F</sup>* = e<sup>0</sup> or t*<sup>F</sup>* = e1, where t*<sup>F</sup>* is a term obtained by replacing Boolean connectives in a formula F by N, O, A. As the Frege proof ends with an empty sequent, the corresponding E-Res proof ends with an empty clause. See the full version for details.

### **5.3 Resolution over LA: A Theory of Linear Arithmetic**

Finally, we study the theory of linear arithmetic LA. A formula in the theory LA over a domain D is a conjunction of expressions of the form Σ*<sup>n</sup> <sup>i</sup>*=1a*i*x*<sup>i</sup>* ◦ <sup>b</sup>, where ◦∈{=, <sup>≤</sup>, <, =, <sup>≥</sup>, >}, and <sup>a</sup>*i*, x*<sup>i</sup>* <sup>∈</sup> <sup>D</sup> — usually, <sup>D</sup> is integers or reals<sup>3</sup>. We show that Res(LA) polynomially simulates the proof system R(lin) introduced by Raz and Tzameret [RT08]. This is interesting, as R(lin) has polynomial-size proofs of several difficult tautologies considered in proof complexity, such as the pigeonhole principle, Tseitin tautologies and the clique-colouring principle.

In the proof system R(lin) propositional variables are linear equations over integers. The input formula is a CNF over such equations, together with *n <sup>i</sup>*=1(x*<sup>i</sup>* = 0 <sup>∨</sup> <sup>x</sup>*<sup>i</sup>* = 1) clauses ensuring 0/1 assignment. The rules of inference consist of a modified resolution rule, together with two structural rules, weakening and simplification:

**Weakening.** From a (possibly empty) clause <sup>A</sup> derive (A∨L) for any equation <sup>L</sup>. **Simplification.** From (<sup>A</sup> <sup>∨</sup> <sup>k</sup> = 0), where <sup>k</sup> = 0 is a constant, derive <sup>A</sup>.

**Proposition 14.** Res(LA) *polynomially simulates* R(lin)*.*

R(lin)**-cut.** Let (<sup>A</sup> <sup>∨</sup> <sup>L</sup>1), (<sup>B</sup> <sup>∨</sup> <sup>L</sup>2) be two clauses containing linear equalities <sup>L</sup><sup>1</sup> and <sup>L</sup>2, respectively. From these two clauses, derive a clause (<sup>A</sup> <sup>∨</sup> <sup>B</sup> <sup>∨</sup> (L<sup>1</sup> <sup>−</sup> <sup>L</sup>2)).

<sup>3</sup> Some definitions of linear arithmetic do not include disequalities; however, as disequalities and strict inequalities occur naturally in SMT context, SMT-oriented linear arithmetic solvers do incorporate mechanisms for dealing with them.

*Proof.* We show how to simulate rules of R(lin) in Res(LA). We can assume, without loss of generality, that Res(LA) has a weakening rule which simulates weakening of R(lin) directly. For the simplification rule, note that LA <sup>k</sup> = 0 for any <sup>k</sup> = 0; one application of the resolution rule on (<sup>k</sup> = 0) and (<sup>A</sup> <sup>∨</sup> <sup>k</sup> = 0) results in A.

Finally, let L<sup>1</sup> be Σ*<sup>n</sup> <sup>i</sup>*=1a*i*x*<sup>i</sup>* <sup>=</sup> <sup>b</sup> and <sup>L</sup><sup>2</sup> be <sup>Σ</sup>*<sup>n</sup> <sup>i</sup>*=1c*i*x*<sup>i</sup>* <sup>=</sup> <sup>d</sup>. From (<sup>A</sup> <sup>∨</sup> <sup>L</sup>1), (<sup>B</sup> <sup>∨</sup> <sup>L</sup>2) we want to derive (<sup>A</sup> <sup>∨</sup> <sup>B</sup> <sup>∨</sup> <sup>L</sup><sup>1</sup> <sup>−</sup> <sup>L</sup>2). First derive in LA a clause C = (Σ*<sup>n</sup> <sup>i</sup>*=1a*i*x*<sup>i</sup>* <sup>=</sup> <sup>b</sup>∨Σ*<sup>n</sup> <sup>i</sup>*=1c*i*x*<sup>i</sup>* <sup>=</sup> <sup>d</sup>∨Σ*<sup>n</sup> <sup>i</sup>*=1(a*<sup>i</sup>* <sup>−</sup>c*i*)x*<sup>i</sup>* <sup>=</sup> <sup>b</sup>−d). Resolving (A∨L1) with <sup>C</sup>, and then resolving the resulting clause with (<sup>B</sup> <sup>∨</sup> <sup>L</sup>2) gives the desired (<sup>A</sup> <sup>∨</sup> <sup>B</sup> <sup>∨</sup> (L<sup>1</sup> <sup>−</sup> <sup>L</sup>2)).

Note that we didn't need to specify whether LA is over the integers, rationals or reals, and hence the proof works for any of them. Also, in order to establish our simulations it is sufficient to consider a fragment of LA with only equalities and inequalities, and produce only unit clauses and width-3 clauses of a fixed form.

**Corollary 15.** Res(LA) *has polynomial-size proofs of the pigeonhole principle, Tseitin tautologies and a clique-colouring principle for* k = (n) *size clique and* k = (log n)<sup>2</sup>/8 log log n *size colouring.*

# **6 Lazy vs. Eager Reductions and the Exponential Time Hypothesis**

Throughout this paper we have primarily discussed the *Lazy* approach to SMT. In this section, we consider the *Eager* approach, in which an input formula F over a theory <sup>T</sup> is reduced to an equisatisfiable propositional formula <sup>G</sup>, which is then solved using a suitable (Boolean) solver.

The Eager approach is still used in several modern SMT solvers such as the STP solver for bit-vectors and arrays [GD07]. A common eager reduction used when solving equations over the theory of equality, E (or its generalization to uninterpreted function symbols EUF), is the *Ackermann reduction*. Let us first describe a simple version of the Ackermann reduction over the theory E.

Let F denote a CNF over literals from the theory E—so, each literal is of the form a = b for constant terms a, b—which we will ultimately transform into a Boolean SAT instance. Let n denote the number of constant terms occurring in <sup>F</sup>, let <sup>m</sup> denote the number of distinct literals occurring in <sup>F</sup>, and consider the literal a = b and the literal b = a to be the same. For each literal a = b introduce a Boolean variable x*a*=*b*, and for each clause of literals *<sup>i</sup>* <sup>a</sup>*<sup>i</sup>* <sup>=</sup> <sup>b</sup>*<sup>i</sup>* create a clause *<sup>i</sup>* <sup>x</sup>*<sup>a</sup>i*=*b<sup>i</sup>* . To encode the transitivity of equality, for each triple of terms (a, b, c) occurring in the initial CNF <sup>F</sup> introduce a clause of the form <sup>¬</sup>x*a*=*<sup>b</sup>* <sup>∨</sup> <sup>¬</sup>x*b*=*<sup>c</sup>* <sup>∨</sup> <sup>x</sup>*a*=*c*. Note that the final formula will have <sup>O</sup>(n<sup>2</sup>) Boolean variables corresponding to each possible term a = b—a potential quadratic blow-up which is unavoidable using this encoding due to the transitivity axioms. Observe that this blow-up only occurs in the eager approach—in the lazy approach to solving we only need to consider the literals a = b which occur in the original formula F. It is therefore natural to wonder if this blow-up in the number of input variables can somehow be avoided.

In fact, one can construct a more clever Eager reduction from E-SAT to SAT which only introduces O(n log n) boolean variables; however, this more clever encoding does not represent the literals a = b as Boolean variables x*a*=*<sup>b</sup>* and instead uses a more complicated pointer construction. This improved reduction turns out to be the best possible under the well-known (and widely believed) *Exponential Time Hypothesis*, which is a strengthening of P = NP.

**Exponential Time Hypothesis (ETH).** There is no deterministic or randomized algorithm for SAT running in time 2*<sup>o</sup>*(*n*) , where n is the number of input variables.

**Theorem 16.** *Let* <sup>F</sup> *be an instance of* <sup>E</sup>*-SAT with* <sup>n</sup> *distinct terms. For any polynomial-time reduction* <sup>R</sup> *from* <sup>E</sup>*-SAT to SAT, the boolean formula* <sup>R</sup>(F) *must have* Ω(n log n) *variables unless ETH fails.*

*Proof.* By way of contradiction, suppose that ETH holds and let R be a reduction from E-SAT to SAT which introduces o(n log n) variables. Let 2-CSP denote a constraint satisfaction problem with two variables per constraint. The theorem follows almost immediately from the following result of Traxler [Tra08].

**Theorem 17 (Theorem 1 in** [Tra08]**, Rephrased).** *Consider any 2-CSP* <sup>C</sup>1<sup>∧</sup> <sup>C</sup><sup>2</sup> ∧···∧ <sup>C</sup>*<sup>m</sup> over an alphabet* <sup>Σ</sup> *of size* <sup>d</sup>*, where each constraint is of the form* <sup>x</sup> <sup>=</sup> <sup>a</sup> <sup>∨</sup> <sup>y</sup> <sup>=</sup> <sup>b</sup> *for variables* x, y *and constants* a, b <sup>∈</sup> <sup>Σ</sup>*. Unless ETH fails, every algorithm for this problem requires time* d*cn for some universal constant* c > 0*.*

There is a simple reduction from the restriction of 2-CSP described in the above theorem to E-SAT. Introduce terms e1, e2,...,e*d*, each intended to represent a symbol from the universe Σ, and also terms x1, x2,...,x*<sup>n</sup>* for each variable <sup>x</sup> occurring in the original CSP instance. Now, for each <sup>i</sup> <sup>=</sup> <sup>j</sup> introduce unit clauses <sup>e</sup>*<sup>i</sup>* <sup>=</sup> <sup>e</sup>*<sup>j</sup>* , and similarly for each <sup>i</sup> <sup>∈</sup> [n] add a clause of the form <sup>x</sup>*<sup>i</sup>* <sup>=</sup> <sup>e</sup><sup>1</sup> <sup>∨</sup> <sup>x</sup>*<sup>i</sup>* <sup>=</sup> <sup>e</sup><sup>2</sup> ∨···∨ <sup>x</sup>*<sup>i</sup>* <sup>=</sup> <sup>e</sup>*d*. Finally, for each constraint in the 2-CSP of the form <sup>x</sup>*<sup>i</sup>* <sup>=</sup> <sup>a</sup> <sup>∨</sup> <sup>x</sup>*<sup>j</sup>* <sup>=</sup> <sup>b</sup> introduce a clause <sup>x</sup>*<sup>i</sup>* <sup>=</sup> <sup>e</sup>*<sup>a</sup>* <sup>∨</sup> <sup>x</sup>*<sup>j</sup>* <sup>=</sup> <sup>e</sup>*b*, where <sup>e</sup>*a*, e*<sup>b</sup>* are the terms corresponding to the symbols a, b. Let <sup>F</sup> denote the final E-SAT instance, and it is clear that F is satisfiable if and only if the original 2-CSP is satisfiable, and also that <sup>F</sup> has <sup>n</sup> <sup>+</sup> <sup>d</sup> constant terms.

Now, apply the Ackermann reduction <sup>R</sup> to <sup>F</sup> , obtaining a SAT instance <sup>R</sup>(F ). By assumption the final SAT instance has o((n+d) log(n+d)) variables; running the standard brute-force algorithm for SAT gives an algorithm running in 2*<sup>o</sup>*((*n*+*d*) log(*n*+*d*)) time for the 2-CSP variant described above. However, by the above theorem, every algorithm for this 2-CSP variant requires time at least <sup>d</sup>*cn* = 2*cn* log *<sup>d</sup>*, which violates ETH if <sup>d</sup> <sup>≈</sup> <sup>n</sup>.

# **7 Conclusion**

In this paper, we studied SMT solvers through the lens of proof complexity, introducing a generalization of the resolution proof system and arguing that it correctly models the "lazy" SMT framework DPLL(T) [NOT06]. We further presented and analyzed a stronger version Res∗(T) that allows for the introduction of new literals, and showed that it models DPLL∗(T), which is a modification of an SMT solver that can introduce new theory literals; this captures the new literal introduction in solvers such as Yices and Z3 [Z3,Yic].

There are many natural directions to pursue. First, although we have not considered it here, it is natural to introduce an *intermediate* proof system between Res(T) and Res∗(T) which is allowed to introduce new theory *literals* but *not* new theory *variables*. For instance, if we have the formula <sup>a</sup> <sup>=</sup> <sup>f</sup>(b) <sup>∧</sup> <sup>a</sup> <sup>=</sup> <sup>c</sup> in EUF, then this intermediate proof system could introduce the theory literal c = f(b) but *not* the theory literal f(c) = f(a), whereas both are allowed to be introduced by Res<sup>∗</sup>(T). It is not clear to us if this intermediate system can simulate Frege, and we suggest studying it in its own right.

A second direction that we believe is quite interesting is extending our results on EUF to capture the *extended Frege* system, which is the most powerful proof system typically studied in proposition proof complexity. Intuitively, it seems that EUF by itself is not strong enough to capture extended Frege; we consider finding a new theory T which can capture it an interesting open problem.

# **References**

	- [BHJ17] Balyo, T., Heule, M.J.H., J¨arvisalo, M.: SAT competition 2016: recent developments. In: Singh, S.P., Markovitch, S. (eds.) Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 4–9 February 2017, San Francisco, California, USA, pp. 5061–5063. AAAI Press (2017)
	- [BKS04] Beame, P., Kautz, H.A., Sabharwal, A.: Towards understanding and harnessing the potential of clause learning. J. Artif. Intell. Res. **22**, 319–351 (2004)
	- [BM14] Bjørner, N., de Moura, L.: Tractability and modern SMT solvers. In: Bordeaux, L., Hamadi, Y., Kohli, P. (eds.) Tractability: Practical Approaches to Hard Problems, pp. 350–377. Cambridge University Press (2014)
	- [CR79] Cook, S.A., Reckhow, R.A.: The relative efficiency of propositional proof systems. J. Symb. Log. **44**(1), 36–50 (1979)
	- [DST80] Downey, P.J., Sethi, R., Tarjan, R.E.: Variations on the common subexpression problem. J. ACM (JACM) **27**(4), 758–771 (1980)
	- [GD07] Ganesh, V., Dill, D.L.: A Decision procedure for bit-vectors and arrays. In: Damm, W., Hermanns, H. (eds.) CAV 2007. LNCS, vol. 4590, pp. 519–531. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73368- 3 52
	- [Juk12] Jukna, S.: Boolean Function Complexity: Advances and Frontiers. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-24508-4
	- [Kra98] Kraj´ıˇcek, J.: Discretely ordered modules as a first-order extension of the cutting planes proof system. J. Symb. Log. **63**(04), 1582–1596 (1998)
	- [NOT06] Nieuwenhuis, R., Oliveras, A., Tinelli, C.: Solving SAT and SAT modulo theories. J. ACM **53**(6), 937–977 (2006)
	- [ORC09] Oliveras, A., Rodr1guez-Carbonell, E.: Combining decision procedures: the Nelson-Oppen approach. Techniques (2009)
		- [PD11] Pipatsrisawat, K., Darwiche, A.: On the power of clause-learning SAT solvers as resolution engines. Artif. Intell. **175**(2), 512–525 (2011)
		- [RT08] Raz, R., Tzameret, I.: Resolution over linear equations and multilinear proofs. Annals Pure Appl. Log. **155**(3), 194–224 (2008)
		- [smt] The Annual SMTCOMP Competition Website. http://www.smtcomp.org
		- [Tin12] Tinelli, C.: Foundations of Lazy SMT and DPLL(T) (2012)
		- [Tra08] Traxler, P.: The time complexity of constraint satisfaction. In: Grohe, M., Niedermeier, R. (eds.) IWPEC 2008. LNCS, vol. 5018, pp. 190–201. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79723- 4 18
			- [Yic] The Yices SMT Solver. http://yices.csl.sri.com/
			- [Z3] The Z3 Theorem Prover. https://github.com/Z3Prover

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Model Generation for Quantified Formulas: A Taint-Based Approach**

Benjamin Farinier1,2(B), S´ebastien Bardin<sup>1</sup>, Richard Bonichon<sup>1</sup>, and Marie-Laure Potet<sup>2</sup>

> <sup>1</sup> CEA, LIST, Software Safety and Security Lab, Universit´e Paris-Saclay, Gif-sur-Yvette, France {benjamin.farinier,sebastien.bardin, richard.bonichon}@cea.fr <sup>2</sup> Univ. Grenoble Alpes, Verimag, Grenoble, France {benjamin.farinier, marie-laure.potet}@univ-grenoble-alpes.fr

**Abstract.** We focus in this paper on generating models of quantified first-order formulas over built-in theories, which is paramount in software verification and bug finding. While standard methods are either geared toward proving the absence of a solution or targeted to specific theories, we propose a generic and radically new approach based on a reduction to the quantifier-free case. Our technique thus allows to reuse all the efficient machinery developed for that context. Experiments show a substantial improvement over state-of-the-art methods.

# **1 Introduction**

**Context.** Software verification methods have come to rely increasingly on reasoning over logical formulas modulo theory. In particular, the ability to generate models (i.e., find solutions) of a formula is of utmost importance, typically in the context of bug finding or intensive testing—symbolic execution [21] or bounded model checking [7]. Since *quantifier-free first-order formulas* on well-suited theories are sufficient to represent many reachability properties of interest, the Satisfiability Modulo Theory (SMT) [6,25] community has primarily dedicated itself to designing solvers able to efficiently handle such problems.

Yet, universal quantifiers are sometimes needed, typically when considering preconditions or code abstraction. Unfortunately, most theories handled by SMT-solvers are undecidable in the presence of universal quantifiers. There exist dedicated methods for a few decidable quantified theories, such as Presburger arithmetic [9] or the array property fragment [8], but there is no general and effective enough approach for the model generation problem over universally quantified formulas. Indeed, generic solutions for quantified formulas involving heuristic instantiation and refutation are best geared to proving the unsatisfiability of a formula (i.e., absence of solution) [13,20], while recent proposals such as local theory extensions [2], finite instantiation [31,32] or model-based instantiation [20,29] either are too narrow in scope, or handle quantifiers on free sorts only, or restrict themselves to finite models, or may get stuck in infinite refinement loops.

**Goal and Challenge.** Our goal is to propose a generic and efficient approach to the model generation problem over arbitrary quantified formulas with support for theories commonly found in software verification. Due to the huge effort made by the community to produce state-of-the-art solvers for quantifier-free theories (QF*-solvers*), it is highly desirable for this solution to be compatible with current leading decision procedures, namely SMT approaches.

**Proposal.** Our approach turns a quantified formula into a quantifier-free formula with the guarantee that any model of the latter contains a model of the former. The benefits are threefold: the transformed formula is easier to solve, it can be sent to standard QF-solvers, and a model for the initial formula is deducible from a model of the transformed one. The idea is to ignore quantifiers but strengthen the quantifier-free part of the formula with an *independence condition* constraining models to be independent from the (initially) quantified variables.

**Contributions.** This paper makes the following contributions:


**Discussions.** Our approach supplements state-of-the-art model generation on quantified formulas by providing a more generic handling of satisfiable problems. We can deal with quantifiers on any sort and we are not restricted to finite models. Moreover, this is a lightweight preprocessing approach requiring a single call to the underlying quantifier-free solver. The method also extends to *partial* elimination of universal quantifiers, or reduction to *quantified-but-decidable* formulas (Sect. 5.4).

While techniques *a la* E-matching allow to lift quantifier-free solvers to the unsatisfiability checking of quantified formulas, this works provides a mechanism to lift them to the satisfiability checking and model generation of quantified formulas, yielding a more symmetric handling of quantified formulas in SMT. This new approach paves the way to future developments such as the definition of more precise inference mechanisms of independence conditions, the identification of interesting subclasses for which inferring weakest independence conditions is feasible, and the combination with other quantifier instantiation techniques.

# **2 Motivation**

Let us take the code sample in Fig. 1 and suppose we want to reach function analyze me. For this purpose, we need a model (a.k.a., solution) of the reachability condition φ ax + b > 0, where a, b and x are symbolic variables associated to the program variables a, b and x. However, while the values of a and b are user-controlled, the value of x is not. Therefore if we want to reach analyze me in a reproducible manner, we actually need a model of φ<sup>∀</sup> - ∀x.ax + b > 0, which *involves universal quantification*. While this specific formula is simple, model generation for quantified formulas is notoriously difficult: PSPACE-complete for booleans, undecidable for uninterpreted functions or arrays.

**Reduction to the Quantifier-Free Case Through Independence.** We propose to ignore the universal quantification over x, but *restrict models to those* *which do not depend on* x. For example, model {a = 1, x = 1, b = 0} does depend on x, as taking x = 0 invalidates the formula, while model {a = 0, x = 1, b = 1} is *independent of* x. We call constraint ψ - (a = 0) an *independence condition*: any interpretation of φ satisfying ψ will be independent of x, and therefore a model of φ ∧ ψ will give us a model of φ∀.

**Inference of Independence Conditions Through Tainting.** Figure 1 details in its right part a way to infer such independence conditions. Given a quantified reachability condition (1), we first associate to every variable v a (boolean) *taint variable* v• indicating whether the solution may depend on v (value ) or not (value ⊥). Here, x• is set to ⊥, a• and b• are set to (2). An independence condition (3)—a formula modulo theory—is then constructed using both initial and taint variables. We extend taint constraints to terms, t • indicating here whether t may depend on x or not, and we require the top-level term (i.e., the formula) to be tainted to (i.e., to be indep. from x). Condition (3) reads as follows: in order to enforce that (ax + b > 0)• holds, we enforce that (ax)• and b• hold, and for (ax)• we require that either a• and x• hold, or a• holds and a = 0 (absorbing the value of x), or the symmetric case. We see that · • is defined recursively and combines a *systematic part* (if t • holds then f(t)• holds, for any f) with a *theory-dependent part* (here, based on ×). After simplifications (4), we obtain a = 0 as an independence condition (5) which is adjoined to the reachability condition freed of its universal quantification (6). <sup>A</sup> QF-solver provides a model of (6) (e.g., {<sup>a</sup> = 0, <sup>b</sup> = 1, <sup>x</sup> = 5}), lifted into a model of (1) by discarding the valuation of x (e.g., {a = 0, b = 1}).

In this specific example the inferred independence condition (5) is the most generic one and (1) and (6) are equisatisfiable. Yet, in general it may be an under-approximation, constraining the variables more than needed and yielding a correct but incomplete decision method: a model of (6) can still be turned into a model of (1), but (6) might not have a model while (1) has.

### **3 Notations**

We consider the framework of many-sorted first-order logic with equality, and we assume standard definitions of sorts, signatures and terms. Given a tuple of variables *x* - (x1,...,xn) and a quantifier Q (∀ or ∃), we shorten Qx<sup>1</sup> ... Qxn.Φ as Q*x*.Φ. A formula is in *prenex normal form* if it is written as Q1*x*<sup>1</sup> ... Qn*x*n.Φ with Φ a quantifier-free formula. A formula is in *Skolem normal form* if it is in prenex normal form with only universal quantifiers. We write Φ(*x*) to denote that the free variables of Φ are in *x*. Let *t* - (t1,...,tn) be a term tuple, we write Φ(*t*) for the formula obtained from Φ by replacing each occurrence of x<sup>i</sup> in Φ by ti. An *interpretation* I associates a domain to each sort of a signature and a value to each symbol of a formula, and -Δ<sup>I</sup> denotes the evaluation of term Δ over I. A *satisfiability relation* |= between interpretations and formulas is defined inductively as usual. A *model* of Φ is an interpretation I satisfying I |= Φ. We sometimes refer to models as "solutions". Formula Ψ *entails* formula Φ, written Ψ |= Φ, if every interpretation satisfying Ψ satisfies Φ as well. Two formulas are equivalent, denoted Ψ ≡ Φ, if they have the same models. A *theory* T - (Σ, *I*) restricts symbols in Σ to be interpreted in *I*. The quantifier-free fragment of <sup>T</sup> is denoted QF-<sup>T</sup> .

**Convention.** Letters a, b, c . . . denote uninterpreted symbols and variables. Letters x, y, z . . . denote quantified variables. *a*, *b*, *c* denote sets of uninterpreted symbols. *x*, *y*, *z* ... denote sets of quantified variables. Finally, a, b, c ... denote valuations of associated (sets of) symbols.

*In the rest of this paper, we assume w.l.o.g. that all formulas are in Skolem normal form. Recall that any formula* φ *in classical logic can be normalized into a formula* ψ *in Skolem normal form such that any model of* φ *can be lifted into a model of* ψ, *and vice versa. This strong relation, much closer to formula equivalence than to formula equisatisfiability, ensures that our correctness and completeness results all along the paper hold for arbitrarily quantified formula.*

*Companion Technical Report. Additional technical details (proofs, experiments, etc.) are available online at* http://benjamin.farinier.org/cav2018/.

# **4 Musing with Independence**

### **4.1 Independent Interpretations, Terms and Formulas**

A solution (x, a) of Φ does not depend on *x* if Φ(*x*, *a*) is always true or always false, for all possible valuations of *x* as long as *a* is set to a. More formally, we define the independence of an interpretation of Φ w.r.t. *x* as follows:

# **Definition 1 (Independent interpretation)**


Regarding formula ax + b > 0 from Fig. 1, {a = 0, b = 1, x = 1} is independent of x while {a = 1, b = 0, x = 1} is not. Considering term (t[a ← b]) [c], with t an array written at index a then read at index c, {a = 0, b = 42, c = 0, t = [... ]} is independent of t (evaluates to 42) while {a = 0, b = 1, c = 2, t = [... ]} is not (evaluates to t[2]). We now define independence for formulas and terms.

### **Definition 2 (Independent formula and term)**


Definition 2 of formula and term independence is far stronger than Definition 1 of interpretation independence. Indeed, it can easily be checked that if a formula Φ (resp. a term Δ) is independent of *x*, then any interpretation of Φ (resp. Δ) is independent of *x*. However, the converse is false as formula ax+b > 0 is not independent of x, but has an interpretation {a = 0, b = 1, x = 1} which is.

### **4.2 Independence Conditions**

Since it is rarely the case that a formula (resp. term) is independent from a set of variables *x*, we are interested in *Sufficient Independence Conditions*. These conditions are additional constraints that can be added to a formula (resp. term) in such a way that they make the formula (resp. term) independent of *x*.

### **Definition 3 (Sufficient Independence Condition (SIC))**


We denote by sicΦ,*<sup>x</sup>* (resp. sicΔ,*x*) a Sufficient Independence Condition for a formula Φ(*x*, *a*) (resp. for a term Δ (*x*, *a*)) with regard to *x*. For example, <sup>a</sup> = 0 is a sicΦ,x for formula <sup>Φ</sup> ax <sup>+</sup> b > 0, and <sup>a</sup> <sup>=</sup> <sup>c</sup> is a sicΔ,t for term Δ - (t[<sup>a</sup> <sup>←</sup> <sup>b</sup>]) [c]. Note that <sup>⊥</sup> is always a sic, and that sic are closed under <sup>∧</sup> and <sup>∨</sup>. Proposition <sup>1</sup> clarifies the interest of sic for model generation.

**Proposition 1 (Model generalization).** *Let* Φ(*x*, *a*) *a formula and* Ψ *a* sicΦ,*x. If there exists an interpretation* {x, <sup>a</sup>} *such that* {x, <sup>a</sup>} |<sup>=</sup> <sup>Ψ</sup> (*a*)∧Φ(*x*, *<sup>a</sup>*)*, then* {a} |= ∀*x*.Φ(*x*, *a*)*.*

*Proof (sketch of ).* Appendix C.1 of the companion technical report.

For the sake of completeness, we introduce now the notion of *Weakest Independence Condition* for a formula Φ(*x*, *a*) with regard to *x* (resp. a term <sup>Δ</sup> (*x*, *<sup>a</sup>*)). We will denote such conditions wicΦ,*<sup>x</sup>* (resp. wicΔ,*x*).

### **Definition 4 (Weakest Independence Condition (WIC))**


Note that Ω - <sup>∀</sup>*x*.∀*y*.(Φ(*x*, *<sup>a</sup>*) <sup>⇔</sup> <sup>Φ</sup>(*y*, *<sup>a</sup>*)) is always a wicΦ,*x*, and any formula <sup>Π</sup> is a wicΦ,*<sup>x</sup>* if and only if <sup>Π</sup> <sup>≡</sup> <sup>Ω</sup>. Therefore all syntactically different wic have the same semantics. As an example, both sic <sup>a</sup> = 0 and <sup>a</sup> <sup>=</sup> <sup>c</sup> presented earlier are wic. Proposition <sup>2</sup> emphasizes the interest of wic for model generation.

**Proposition 2 (Model specialization).** *Let* Φ(*x*, *a*) *a formula and* Π(*a*) *a* wicΦ,*x. If there exists an interpretation* {a} *such that* {a} |<sup>=</sup> <sup>∀</sup>*x*.Φ(*x*, *<sup>a</sup>*)*, then* {x, a} |= Π (*a*) ∧ Φ(*x*, *a*) *for any valuation* x *of x.*

*Proof (sketch of ).* Appendix C.2 of the companion technical report.

From now on, our goal is to infer from a formula <sup>∀</sup>*x*.Φ(*x*, *<sup>a</sup>*) a sicΦ,*<sup>x</sup>* <sup>Ψ</sup> (*a*), find a model for <sup>Ψ</sup> (*a*) <sup>∧</sup> <sup>Φ</sup>(*x*, *<sup>a</sup>*) and generalize it. This sicΦ,*<sup>x</sup>* should be as weak—in the sense "less coercive"—as possible, as otherwise ⊥ could always be used, which would not be very interesting for our overall purpose.

For the sake of simplicity, previous definitions omit to mention the theory to which the sic belongs. If the theory <sup>T</sup> of the quantified formula is decidable we can always choose <sup>∀</sup>*x*.∀*y*.(Φ(*x*, *<sup>a</sup>*) <sup>⇔</sup> <sup>Φ</sup>(*y*, *<sup>a</sup>*)) as a sic, but it is simpler to directly use a T -solver. *The challenge is, for formulas in an undecidable theory* <sup>T</sup> , *to find a non-trivial* sic *in its quantifier-free fragment* QF-<sup>T</sup> .

Under this constraint, we cannot expect a systematic construction of wic, as it would allow to decide the satisfiability of any quantified theory with a decidable quantifier-free fragment. Yet informally, the closer a sic is to be a wic, the closer our approach is to completeness. Therefore this notion might be seen as a fair gauge of the quality of a sic. *Having said that, we leave a deeper study on the inference of* wic *as future work.*

### **5 Generic Framework for SIC-Based Model Generation**

We describe now our overall approach. Algorithm <sup>1</sup> presents our sic-based generic framework for model generation (Sect. 5.1). Then, Algorithm 2 proposes a taint-based approach for sic inference (Sect. 5.2). Finally, we discuss complexity and efficiency issues (Sect. 5.3) and detail extensions (Sect. 5.4), such as partial elimination.

*From now on, we do not distinguish anymore between terms and formulas, their treatment being symmetric, and we call targeted variables the variables we want to be independent of.*

### **5.1 SIC-Based Model Generation**

Our model generation technique is described in Algorithm 1. Function solveQ takes as input a formula <sup>∀</sup>*x*.Φ(*x*, *<sup>a</sup>*) over a theory <sup>T</sup> . It first calculates a sicΦ,*<sup>x</sup>* <sup>Ψ</sup> (*a*) in QF-<sup>T</sup> . Then it solves <sup>Φ</sup>(*x*, *<sup>a</sup>*) <sup>∧</sup> <sup>Ψ</sup> (*a*). Finally, depending on the result and whether <sup>Ψ</sup> (*a*) is a wicΦ,*<sup>x</sup>* or not, it answers sat, unsat or unknown. solveQ is parametrized by two functions solveQF and inferSIC:

solveQF is a decision procedure (typically a SMT solver) for QF-<sup>T</sup> . solveQF is said to be *correct* if each time it answers sat (resp. unsat) the formula is satisfiable (resp. unsatisfiable); it is said to be *complete* if it always answers sat or unsat, never unknown.

**Algorithm 1.** SIC-based model generation for quantified formulas

**Parameter:** solveQF **Input**: <sup>Φ</sup>(v) a formula in QF-<sup>T</sup> **Output**: sat (v) with <sup>v</sup> <sup>|</sup><sup>=</sup> <sup>Φ</sup>, unsat or unknown **Parameter:** inferSIC **Input**: <sup>Φ</sup> a formula in QF-<sup>T</sup> , and *<sup>x</sup>* a set of targeted variables **Output**: <sup>Ψ</sup> <sup>a</sup> sicΦ,*<sup>x</sup>* in QF-<sup>T</sup> **Function** solveQ**: Input**: <sup>∀</sup>*x*.Φ (*x*, *<sup>a</sup>*) a universally quantified formula over theory <sup>T</sup> **Output**: sat (a) with <sup>a</sup> <sup>|</sup><sup>=</sup> <sup>∀</sup>*x*.Φ (*x*, *<sup>a</sup>*), unsat or unknown Let Ψ (*a*) inferSIC (Φ(*x*, *a*) , *x*) **match** solveQF (Φ(*x*, *<sup>a</sup>*) <sup>∧</sup> <sup>Ψ</sup> (*a*)) **with** sat (x, <sup>a</sup>) **return** sat (a) **with** unsat **if** <sup>Ψ</sup> *is a* wicΦ,*<sup>x</sup>* **then return** unsat **else return** unknown **with** unknown **return** unknown

inferSIC takes as input a formula <sup>Φ</sup> in QF-<sup>T</sup> and a set of targeted variables *<sup>x</sup>*, and produces a sicΦ,*<sup>x</sup>* in QF-<sup>T</sup> . It is said to be *correct* if it always returns a sic, and *complete* if all the sic it returns are wic. A possible implementation of inferSIC is described in Algorithm 2 (Sect. 5.2).

Function solveQ enjoys the two following properties, where correctness and completeness are defined as for solveQF.

#### **Theorem 1 (Correctness and completeness)**


*Proof (sketch of ).* Follow directly from Propositions 1 and 2 (Sect. 4.2).

#### **5.2 Taint-Based SIC Inference**

Algorithm 2 presents a taint-based implementation of function inferSIC. It consists of a (syntactic) core calculus described here, refined by a (semantic) theorydependent calculus theorySIC described in Sect. 6. From formula Φ(*x*, *a*) and targeted variables *x*, inferSIC is defined recursively as follow.

If Φ is a constant it returns as constants are independent of any variable. If Φ is a variable v, it returns if we may depend on v (i.e., v ∈ *x*), ⊥ otherwise. If Φ is a function f (φ1,...,φn), it first recursively computes for every sub-term φ<sup>i</sup> <sup>a</sup> sic<sup>φ</sup>*i*,*<sup>x</sup>* <sup>ψ</sup>i. Then these results are sent with <sup>Φ</sup> to theorySIC which computes a sicΦ,*<sup>x</sup>* <sup>Ψ</sup>. The procedure returns the disjunction between <sup>Ψ</sup> and the conjunction of the ψi's. Note that theorySIC default value ⊥ is absorbed by the disjunction.

# **Algorithm 2.** Taint-based sic inference

```
Parameter: theorySIC
   Input: f a function symbol, its parameters φi, x a set of targeted variables
           and ψi their associated sicφi,x
   Output: Ψ a sicf(φi),x
   Default: Return ⊥
Function inferSIC(Φ,x):
   Input: Φ a formula and x a set of targeted variables
   Output: Ψ a sicΦ,x
   either Φ is a constant return 
   either Φ is a variable v return v /∈ x
   either Φ is a function f (φ1,.,φn)
       Let ψi -
               inferSIC (φi, x) for all i ∈ {1,.,n}
       Let Ψ -
               theorySIC (f, (φ1,., φn) , (ψ1,., ψn) , x)
       return Ψ ∨ -

                     i ψi
```
The intuition is that if the φi's are independent of *x*, then f (φ1,...,φn) is. Therefore Algorithm 2 is said to be *taint-based* as, when theorySIC is left to its default value, it acts as a form of taint tracking [15,27] inside the formula.

**Proposition 3 (Correctness).** *Given a formula* Φ(*x*, *a*) *and assuming that theorySIC is correct, then inferSIC* (Φ, *<sup>x</sup>*) *indeed computes a* sicΦ,*x.*

*Proof (sketch of ).* This proof has been mechanized in Coq<sup>1</sup>.

Note that on the other hand, completeness does not hold: in general inferSIC does not compute a wic, cf. discussion in Sect. 5.4.

### **5.3 Complexity and Efficiency**

We now evaluate the overhead induced by Algorithm 1 in terms of formula size and complexity of the resolution—the running time of Algorithm 1 itself being expected to be negligible (preprocessing).

**Definition 5.** *The size of a term is inductively defined as size* (x) - 1 *for* x *a variable, and size* (f (t1,...,tn)) - 1 + Σ<sup>i</sup> *size* (ti) *otherwise. We say that theorySIC is bounded in size if there exists* K *such that, for all terms* Δ*, size* (*theorySIC* (Δ, ·)) ≤ K*.*

**Proposition 4 (Size bound).** *Let* N *be the maximal arity of symbols defined by theory* T *. If theorySIC is bounded in size by* K*, then for all formula* Φ *in* T *, size* (*inferSIC* (Φ, ·)) ≤ (K + N) · *size* (Φ)*.*

<sup>1</sup> http://benjamin.farinier.org/cav2018/.

**Proposition 5 (Complexity bound).** *Let us suppose theorySIC bounded in size, and let* Φ *be a formula belonging to a theory* T *with polynomial-time checkable solutions. If* <sup>Ψ</sup> *is a* sicΦ,· *produced by inferSIC, then a solution for* <sup>Φ</sup> <sup>∧</sup> <sup>Ψ</sup> *is checkable in time polynomial in size of* Φ*.*

*Proof (sketch of ).* Appendices C.3 and C.4 of the companion technical report.

These propositions demonstrate that, for formula landing in complex enough theories, our method lifts QF-solvers to the quantified case (in an approximated way) without any significant overhead, as long as theorySIC is bounded in size. This latter constraint can be achieved by systematically binding sub-terms to (constant-size) fresh names and having theorySIC manipulates these binders.

### **5.4 Discussions**

**Extension.** Let us remark that our framework encompasses partial quantifier elimination as long as the remaining quantifiers are handled by solveQF. For example, we may want to remove quantifications over arrays but keep those on bitvectors. In this setting, inferSIC can also allow some level of quantification, providing that solveQF handles them.

**About WIC.** As already stated, inferSIC does not propagate wic in general. For example, considering formulas t<sup>1</sup> - (x < 0) and t<sup>2</sup> - (<sup>x</sup> <sup>≥</sup> 0), then wic<sup>t</sup>1,x <sup>=</sup> <sup>⊥</sup> and wic<sup>t</sup>2,x <sup>=</sup> <sup>⊥</sup>. Hence inferSIC returns <sup>⊥</sup> as sic for <sup>t</sup><sup>1</sup> <sup>∨</sup> <sup>t</sup>2, while actually wic<sup>t</sup>1∨t2,x <sup>=</sup> .

Nevertheless, we can already highlight a few cases where wic can be computed. (1) inferSIC does propagate wic on one-to-one uninterpreted functions. (2) If no variable of *x* appears in any sub-term of f(t, t ), then the associated wic is . While a priori naive, this case becomes interesting when combined with simplifications (Sect. 7.1) that may eliminate *x*. (3) If a sub-term falls in a sub-theory admitting quantifier elimination, then the associated wic is computed by eliminating quantifiers in (∀.*x*.*y*.Φ(*x*, *a*) ⇔ Φ(*y*, *a*)). (4) We may also think of dedicated patterns: regarding bitvectors, the wic for <sup>x</sup> <sup>≤</sup> <sup>a</sup> <sup>⇒</sup> <sup>x</sup> <sup>≤</sup> <sup>x</sup>+<sup>k</sup> is <sup>a</sup> <sup>≤</sup> Max <sup>−</sup> <sup>k</sup>. *Identifying under which condition* wic *propagation holds is a strong direction for future work.*

### **6 Theory-Dependent SIC Refinements**

We now present theory-dependent sic refinements for theories relevant to program analysis: booleans, fixed-size bitvectors and arrays—recall that uninterpreted functions are already handled by Algorithm 2. We then propose a generalization of these refinements together with a correctness proof for a larger class of operators.

### **6.1 Refinement on Theories**

We recall theorySIC takes four parameters: a function symbol f, its arguments (t1,...,tn), their associated sic (<sup>t</sup> • 1,...,t• <sup>n</sup>), and targeted variables *x*. theorySIC pattern-matches the function symbol and returns the associated sic according to rules in Fig. 2. If a function symbol is not supported, we return the default value ⊥. Constants and variables are handled by inferSIC. For the sake of simplicity, rules in Fig. 2 are defined recursively, but can easily fit the interface required for theorySIC in Algorithm 2 by turning recursive calls into parameters.

**Booleans and Ite.** Rules for the boolean theory (Fig. 2a) handles ⇒, ∧, ∨ and ite (if-then-else). For binary operators, the sic is the conjunction of the sic associated to one of the two sub-terms and a constraint on this sub-term that forces the result of the operator to be constant—e.g., to be equal to ⊥ (resp. ) for the antecedent (resp. consequent) of an implication. These equality constraints are based on absorbing elements of operators.

Inference for the ite operator is more subtle. Intuitively, if its condition is independent of some *<sup>x</sup>*, we use it to select the sic*<sup>x</sup>* of the sub-term that will be selected by the ite operator. If the condition is dependent of *x*, then we cannot use it anymore to select a sic*<sup>x</sup>*. In this case, we return the conjunction of the sic*<sup>x</sup>* of both sub-terms and the constraint that the two sub-terms are equal.

**Fig. 2.** Examples of refinements for theorySIC

**Bitvectors and Arrays.** Rules for bitvectors (Fig. 2b) follow similar ideas, with constant (resp. ⊥) substituted by 1<sup>n</sup> (resp. 0n), the bitvector of size n full of ones (resp. zeros). Rules for arrays (Fig. 2c) are derived from the theory axioms. The definition is recursive: rules need be applied until reaching either a store at the position where the select occurs, or the initial array variable.

As a rule of thumb, good sic can be derived from function axioms in the form of rewriting rules, as done for arrays. Similar constructions can be obtained for example for stacks or queues.

# **6.2** *R***-Absorbing Functions**

We propose a generalization of the previous theory-dependent sic refinements to a larger class of functions, and prove its correctness.

Intuitively, if a function has an absorbing element, constraining one of its operands to be equal to this element will ensure that the result of the function is independent of the other operands. However, it is not enough when a relation between some elements is needed, such as with (t[a ← b]) [c] where constraint a = c ensures the independence with regards to t. We thus generalize the notion of absorption to R-absorption, where R is a relation between function arguments.

**Definition 6.** *Let* f : τ<sup>1</sup> ×···× τ<sup>n</sup> → τ *a function.* f *is* R*-absorbing if there exists* I<sup>R</sup> ⊂ {1, ··· , n} *and* R *a relation between* α<sup>i</sup> : τi, i ∈ I<sup>R</sup> *such that, for all* b - (b1,...,bn) *and* c - (c1,...,cn) ∈ τ<sup>1</sup> ×···× τn*, if* R( b| <sup>I</sup>*<sup>R</sup>* ) *and* <sup>b</sup><sup>|</sup> <sup>I</sup>*<sup>R</sup>* <sup>=</sup> <sup>c</sup><sup>|</sup> I*R where* ·|<sup>I</sup>*<sup>R</sup> is the projection on* <sup>I</sup>R*, then* <sup>f</sup>(b) = <sup>f</sup>(c)*.*

I<sup>R</sup> *is called the support of the relation of absorption* R*.*

For example, (a, b) → a ∨ b has two pairs R, IR coinciding with the usual notion of absorption, a=, {1a} and b=, {2b}. Function (x, y, z) → xy + z has among others the pair x= 0, {1x, 3z}, while (a, b, c, t) → (t[a ← b]) [c] has the pair a=c, {1a, 3c}. We can now state the following proposition:

**Proposition 6.** *Let* f (t1,...,tn) *be a* R*-absorbing function of support* IR*, and let* t • <sup>i</sup> *be a* sic<sup>t</sup>*i*,*<sup>x</sup> for some <sup>x</sup>. Then* <sup>R</sup> (t<sup>i</sup>∈I*R*) - <sup>i</sup>∈I*<sup>R</sup>* <sup>t</sup> • <sup>i</sup> *is a* sicf,*x.*

*Proof (sketch of ).* Appendix C.5 of the companion technical report.

Previous examples (Sect. 6.1) can be recast in term of R-absorbing function, proving their correctness (cf. companion technical report). Note that regarding our end-goal, we should accept only <sup>R</sup>-absorbing functions in QF-<sup>T</sup> .

# **7 Experimental Evaluation**

This section describes the implementation of our method (Sect. 7.1) for bitvectors and arrays (ABV), together with experimental evaluation (Sect. 7.2).

### **7.1 Implementation**

Our prototype Tfml (*Taint engine for ForMuLa*)<sup>2</sup> comprises 7 klocs of OCaml. Given an input formula in the SMT-LIB format [5] (ABV theory), Tfml performs several normalizations before adding taint information following Algorithm 1. The process ends with simplifications as taint usually introduces many constant values, and a new SMT-LIB formula is output.

**Sharing with Let-Binding.** This stage is crucial as it allows to avoid term duplication in theorySIC (Algorithm 2, Sect. 5.3, and Proposition 4). We introduce new names for relevant sub-terms in order to easily share them.

**Simplifications.** We perform constant propagation and rewriting (standard rules, e.g. x − x → 0 or x × 1 → x) on both initial and transformed formulas – equality is soundly approximated by syntactic equality.

<sup>2</sup> http://benjamin.farinier.org/cav2018/.

**Shadow Arrays.** We encode taint constraints over arrays through *shadow arrays*. For each array declared in the formula, we declare a (taint) shadow array. The default value for all cells of the shadow array is the taint of the original array, and for each value stored (resp. read) in the original array, we store (resp. read) the taint of the value in the shadow array. As logical arrays are infinite, we cannot constrain all the values contained in the initial shadow array. Instead, we rely on a common trick in array theory: we constrain only cells corresponding to a relevant read index in the formula.

**Iterative Skolemization.** While we have supposed along the paper to work on skolemized formulas, we have to be more careful in practice. Indeed, skolemization introduce dependencies between a skolemized variable and all its preceding universally quantified variables, blurring our analysis and likely resulting in considering the whole formula as dependent. Instead, we follow an iterative process: 1. Skolemize the first block of existentially quantified variables; 2. Compute the independence condition for any targeted variable in the first block of universal quantifiers and remove these quantifiers; 3. Repeat. This results in full Skolemization together with the construction of an independence condition, while avoiding many unnecessary dependencies.

# **7.2 Evaluation**

**Objective.** We experimentally evaluate the following research questions: *RQ1* How does our approach perform with regard to state-of-the-art approaches for model generation of quantified formulas? *RQ2* How effective is it at lifting quantifier-free solvers into (sat-only) quantified solvers? *RQ3* How efficient is it in terms of preprocessing time and formula size overhead? We evaluate our method on a set of formulas combining arrays and bitvectors (paramount in software verification), against state-of-the-art solvers for these theories.

**Protocol.** The experimental setup below runs on an Intel(R) Xeon(R) E5-2660 v3 @ 2.60 GHz, 4 GB RAM per process, and a timeout of 1000 s per formula.


**Table 1.** Answers and resolution time (in seconds, include timeout)

Solver*•*: solver enhanced with our method. Z3E, CVC4E: essentially E-matching



**Table 2.** Complementarity of our approach with existing solvers (sat instances)

**Results.** Tables 1 and 2 and Fig. 3 sum up our experimental results, which have all been cross-checked for consistency. Table 1 reports the number of successes (sat or unsat) and failures (unknown), plus total solving times. The • sign indicates formulas preprocessed with our approach. In that case it is impossible to correctly answer unsat (no wic checking), the unsat line is thus N/A. Since Boolector does not support quantified ABV formulas, we only give results with our approach enabled. Table <sup>1</sup> reads as follow: of the 1269 SMT-LIB formulas, standalone Z3 solves 426 formulas (261 sat, 165 unsat), and 366 (all sat) if preprocessed. Interestingly, our approach always improves the underlying solver in terms of solved (sat) instances, either in a significant way (SMT-LIB) or in a modest way (Binsec). Yet, recall that in a software verification setting every win matters (possibly new bug found or new assertion proved). For Z3•, it also strongly reduces computation time. Last but not least, Boolector• (a pure QFsolver) turns out to have the best performance on sat-instances, beating stateof-the-art approaches both in terms of solved instances and computation time.

Table 2 substantiates the complementarity of the different methods, and reads as follow: for SMT-LIB, Boolector• solves 224 (sat) formulas missed by Z3, while Z3 solves 86 (sat) formulas missed by Boolector•, and 485 (sat) formulas are solved by either one of them.

Figure 3 shows formula size averaging a 9-fold increase (min 3, max 12): yet they are easier to solve because they are more constrained. Regarding performance and overhead of the tainting process, *taint time is almost always less than 1s* in our experiments (not shown here), 4 min for worst case, clearly dominated by resolution time. The worst case is due

**Fig. 3.** Overhead in formula size

to a pass of linearithmic complexity which can be optimized to be logarithmic.

**Pearls.** We show hereafter two particular applications of our method. Table 3 reports results of another symbolic execution experiment, on the grub example.

On this example, Boolector• completely outperforms existing approaches. As a second application, while the main drawback of our method is that it precludes proving unsat, this is easily mitigated by complementing the approach with another one geared (or able) to proving unsat, yielding efficient solvers for quantified formulas, as shown in Table 4.

**Conclusion.** Experiments demonstrate the relevance of our taint-based technique for model generation. (*RQ1*) Results in Table 1 shows that our approach greatly facilitates the resolution process. *On these examples, our method performs better than state-ofthe-art solvers but also strongly complements them (Table* 2*).* (*RQ2*) Moreover, Table 1 demonstrates that our technique is highly effective at lifting quantifier-free solvers to quantified formulas, in both number of sat answers and computation time. *Indeed, once* **Table 3.** GRUB example


**Table 4.** Best approaches


*lifted, Boolector performs better (for* sat-*only) than Z3 or CVC4 with full quantifier support*. Finally (*RQ3*) our tainting method itself is very efficient both in time and space, making it perfect either for a preprocessing step or for a deeper integration into a solver. In our current prototype implementation, we consider the cost to be low. *The companion technical report contains a few additional experiments on bitvectors and integer arithmetic, including the example from Fig.* 1*.*

# **8 Related Work**

Traditional approaches to solving quantified formulas essentially involve either generic methods geared to proving unsatisfiability and validity [16], or complete but dedicated approaches for particular theories [8,36]. Besides, some recent methods [20,22,31] aim to be correct and complete for larger classes of theories.

**Generic Method for Unsatisfiability.** Broadly speaking, these methods iteratively instantiate axioms until a contradiction is found. They are generic w.r.t. the underlying theory and allow to reuse standard theory solvers, but termination is not guaranteed. Also, they are more suited to prove unsatisfiability than to find models. In this family, E-matching [13,16] shows reasonable cost when combined with conflict-based instantiation [30] or semantic triggers [17,18]. In pure first-order logic (without theories), quantifiers are mainly handled through resolution and superposition [1,26] as done in Vampire [24,33] and E [34].

**Complete Methods for Specific Theories.** Much work has been done on designing complete decision procedures for quantified theories of interest, notably array properties [8], quantified theory of bitvectors [23,36], Presburger arithmetic or Real Linear Arithmetic [9,19]. Yet, they usually come at a high cost.

**Generic Methods for Model Generation.** Some recent works detail attempts at more general approaches to model generation.

*Local theory extensions* [2,22] provide means to extend some decidable theories with free symbols and quantifications, retaining decidability. The approach identifies specific forms of formulas and quantifications (bounded), such that these theory extensions can be solved using finite instantiation of quantifiers together with a decision procedure for the original theory. The main drawback is that the formula size can increase a lot.

*Model-based quantifier instantiation* is an active area of research notably developed in Z3 and CVC4. The basic line is to consider the partial model under construction in order to find the right quantifier instantiations, typically in a tryand-refine manner. Depending on the variants, these methods favors either satisfiability or unsatisfiability. They build on the underlying quantifier-free solver and can be mixed with E-matching techniques, yet each refinement yields a solver call and the refinement process may not terminate. Ge and de Moura [20] study decidable fragments of first-order logic modulo theories for which modelbased quantifier instantiation yields soundness and refutational completeness. Reynolds *et al.* [30], Barbosa [3] and Preiner *et al.* [28] use models to guide the instantiation process towards instances refuting the current model. *Finite model quantifier instantiation* [31,32] reduces the search to finite models, and is indeed geared toward model generation rather than unsatisfiability. Similar techniques have been used in program synthesis [29].

We drop support for the unsatisfiable case but get more flexibility: we deal with quantifiers on any sort, the approach terminates and is lightweight, in the sense that it requires a single call to the underlying quantifier-free solver.

**Other.** Our method can be seen as taking inspiration from program taint analysis [15,27] developed for checking the non-interference [35] of public and secrete input in security-sensitive programs. As far as the analogy goes, our approach should not be seen as checking non-interference, but rather as inferring preconditions of non-interference. Moreover, our formula-tainting technique is closer to dynamic program-tainting than to static program-tainting, in the sense that precise dependency conditions are statically inserted at preprocess-time, then precisely explored at solving-time.

Finally, Darvas *et al.* [11] presents a bottom-up formula strengthening method. Their goal differ from ours, as they are interested in formula welldefinedness (rather than independence) and validity (rather than model generation).

### **9 Conclusion**

This paper addresses the problem of generating models of quantified first-order formulas over built-in theories. We propose a correct and generic approach based on a reduction to the quantifier-free case through the inference of independence conditions. The technique is applicable to any theory with a decidable quantifierfree case and allows to reuse all the work done on quantifier-free solvers. The method significantly enhances the performances of state-of-the-art SMT solvers for the quantified case, and supplements the latest advances in the field.

Future developments aim to tackle the definition of more precise inference mechanisms of independence conditions, the identification of interesting subclasses for which inferring weakest independence conditions is feasible, and the combination with other quantifier instantiation techniques.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Concurrency

# **Partial Order Aware Concurrency Sampling**

Xinhao Yuan(B), Junfeng Yang(B) , and Ronghui Gu

> Columbia University, New York, USA {xinhaoyuan,junfeng,rgu}@cs.columbia.edu

**Abstract.** We present POS, a concurrency testing approach that samples the partial order of concurrent programs. POS uses a novel prioritybased scheduling algorithm that dynamically reassigns priorities regarding the partial order information and formally ensures that each partial order will be explored with significant probability. POS is simple to implement and provides a probabilistic guarantee of error detection better than state-of-the-art sampling approaches. Evaluations show that POS is effective in covering the partial-order space of micro-benchmarks and finding concurrency bugs in real-world programs, such as Firefox's JavaScript engine SpiderMonkey.

# **1 Introduction**

Concurrent programs are notoriously difficult to test. Executions of different threads can interleave arbitrarily, and any such interleaving may trigger unexpected errors and lead to serious production failures [13]. Traditional testing over concurrent programs relies on the system scheduler to interleave executions (or events) and is limited to detect bugs because some interleavings are repeatedly tested while missing many others.

*Systematic testing* [9,16,18,28–30], instead of relying on the system scheduler, utilizes formal methods to systematically schedule concurrent events and attempt to cover all possible interleavings. However, the interleaving space of concurrent programs is exponential to the execution length and often far exceeds the testing budget, leading to the so-called *state-space explosion* problem. Techniques such as partial order reduction (POR) [1,2,8,10] and dynamic interface reduction [11] have been introduced to reduce the interleaving space. But, in most cases, the reduced space of a complex concurrent program is still too large to test exhaustively. Moreover, systematic testing often uses a deterministic search algorithm (e.g., the depth-first search) that only slightly adjusts the interleaving at each iteration, e.g., flip the order of two events. Such a search may very well get stuck in a homogeneous interleaving subspace and waste the testing budget by exploring mostly equivalent interleavings.

To mitigate the state-space explosion problem, randomized scheduling algorithms are proposed to *sample*, rather than enumerating, the interleaving space


**Fig. 1.** (a) An example illustrating random walk's weakness in probabilistic guarantee of error detection, where variable x is initially 0; (b) An example illustrating PCT's redundancy in exploring the partial order.

while still keeping the diversity of the interleavings explored [28]. The most straightforward sampling algorithm is *random walk*: at each step, randomly pick an *enabled* event to execute. Previous work showed that even such a sampling outperformed the exhaustive search at finding errors in real-world concurrent programs [24]. This can be explained by applying the *small-scope hypothesis* [12, Sect. 5.1.3] to the domain of concurrency error detection [17]: errors in real-world concurrent programs are non-adversarial and can often be triggered if a small number of events happen in the right order, which sampling has a good probability to achieve.

Random walk, however, has a unsurprisingly poor probabilistic guarantee of error detection. Consider the program in Fig. 1a. The assertion of thread A fails if, and only if, the statement "x=1" of thread B is executed before this assertion. Without knowing which order (between the assertion and "x=1") triggers this failure as a priori, we should sample both orders uniformly because the probabilistic guarantee of detecting this error is the *minimum* sampling probability of these two orders. Unfortunately, random walk may yield extremely non-uniform sampling probabilities for different orders when only a small number of events matter. In this example, to trigger the failure, the assertion of thread A has to be delayed (or not picked) by m times in random walk, making its probabilistic guarantee as low as 1/2<sup>m</sup>.

To sample different orders more uniformly, *Probabilistic Concurrency Testing* (PCT) [4] depends on a user-provided parameter d, the number of events to delay, to randomly pick d events within the execution, and inserts a preemption before each of the d events. Since the events are picked randomly by PCT, the corresponding interleaving space is sampled more uniformly, resulting in a much stronger probabilistic guarantee than random walk. Consider the program in Fig. 1a again. To trigger the failure, there is no event needed to be delayed, other than having the right thread (i.e. thread B) to run first. Thus, the probability trigger (or avoid) the failure is 1/2, which is much higher than 1/2<sup>m</sup>.

However, PCT does not consider the partial order of events entailed by a concurrent program, such that the explored interleavings are still quite redundant. Consider the example in Fig. 1b. Both A1 and B1 are executed before the barrier and do not race with any statement. Statements A2 and B2 form a race, and so do statements A3 and B3. Depending on how each race is resolved, the program events have total four different partial orders. However, without considering the effects of barriers, PCT will attempt to delay A1 or B1 in vain. Furthermore, without considering the race condition, PCT may first test an interleaving A2 → A3 → B2 → B3 (by delaying A3 and B2), and then test a partial-order equivalent and thus completely redundant interleaving A2 → B2 → A3 → B3 (by delaying A3 and B3). Such redundancies in PCT waste testing resources and weaken the probabilistic guarantee.

Towards addressing the above challenges, this paper makes three main contributions. First, we present a concurrency testing approach, named *partial order sampling* (POS), that samples the concurrent program execution based on the partial orders and provides strong probabilistic guarantees of error detection. In contrast to the sophisticated algorithms and heavy bookkeeping used in prior POR work, the core algorithm of POS is much more straightforward. In POS, each event is assigned with a random priority and, at each step, the event with the highest priority is executed. After each execution, all events that race with the executed event will be reassigned with a fresh random priority. Since each event has its own priority, POS (1) samples the orders of a group of dependent events uniformly and (2) uses one execution to sample independent event groups in parallel, both benefiting its probabilistic guarantee. The priority reassignment is also critical. Consider racing events e<sup>1</sup> and e2, and an initial priority assignment that runs e<sup>1</sup> first. Without the priority reassignment, e<sup>2</sup> may very well be delayed again when a new racing event e<sup>3</sup> occurs because e2's priority is more likely to be small (the reason that e<sup>2</sup> is delayed after e<sup>1</sup> at the first place). Such priority reassignments ensure that POS samples the two orders of e<sup>2</sup> and e<sup>3</sup> uniformly.

Secondly, the probabilistic guarantee of POS has been formally analyzed and shown to be exponentially stronger than random walk and PCT for general programs. The probability for POS to execute any partial order can be calculated by modeling the ordering constraints as a bipartite graph and computing the probability that these constraints can be satisfied by a random priority assignment. Although prior POR work typically have soundness proofs of the space reduction [1,8], those proofs depend on an exhaustive searching strategy and it is unclear how they can be adapted to randomized algorithms. Some randomized algorithms leverage POR to heuristically avoid redundant exploration, but no formal analysis of their probabilistic guarantee is given [22,28]. To the best of our knowledge, POS is the first work to sample partial orders with formal probabilistic guarantee of error detection.

Lastly, POS has been implemented and evaluated using both randomly generated programs and real-world concurrent software such as Firefox's JavaScript engine SpiderMonkey in SCTBench [24]. Our POS implementation supports shared-memory multithreaded programs using Pthreads. The evaluation results show that POS provided 134.1<sup>×</sup> stronger overall guarantees than random walk and PCT on randomly generated programs, and the error detection is 2.6<sup>×</sup> faster than random walk and PCT on SCTBench. POS managed to find the six most difficult bugs in SCTBench with the highest probability among all algorithms evaluated and performed the best among 20 of the total 32 non-trivial bugs in our evaluation.

*Related Work.* There is a rich literature of concurrency testing. Systematic testing [9,14,18,28] exhaustively enumerates all possible schedules of a program, which suffers from the state-space explosion problem. Partial order reduction techniques [1,2,8,10] alleviate this problem by avoiding exploring schedules that are redundant under partial order equivalence but rely on bookkeeping the massive exploration history to identify redundancy and it is unclear how they can be applied to the sampling methods.

PCT [4] explores schedules containing orderings of small sets of events and guarantees probabilistic coverage of finding bugs involving rare orders of a small number of events. PCT, however, does not take partial orders into account and becomes ineffective when dealing with a large number of ordering events. Also, the need of user-provided parameters diminishes the coverage guarantee, as the parameters are often provided imprecisely. Chistikov et al. [5] introduced hitting families to cover all admissible total orders of a set of events. However, this approach may cover redundant total orders that correspond to the same partial order. RAPOS [22] leverages the ideas from the partial order reduction, resembling our work in its goal, but does not provide a formal proof for its probabilistic guarantee. Our micro-benchmarks show that POS has a 5.0<sup>×</sup> overall advantage over RAPOS (see Sect. 6.1).

Coverage-driven concurrency testing [26,32] leverages relaxed coverage metrics to discover rarely explored interleavings. Directed testing [21,23] focuses on exploring specific types of interleavings, such as data races and atomicity violations, to reveal bugs. There is a large body of other work showing how to detect concurrency bugs using static analysis [19,25] or dynamic analysis [7,15,20]. But none of them can be effectively applied to real-world software systems, while still have formal probabilistic guarantees.

# **2 Running Example**

Figure 2 shows the running example of this paper. In this example, we assume that memory accesses are sequentially consistent and all shared variables (e.g., x, w, etc.) are initialized to be 0. The program consists of two threads, i.e., A and B. Thread B will be blocked at B4 by wait(w) until w > 0. Thread A will set w to be 1 at A3 via signal(w) and unblock thread B. The assertion at A4 will fail if, and only if, the program is executed in the following total order:

$$\mathsf{B1} \to \mathsf{A1} \to \mathsf{B2} \to \mathsf{B3} \to \mathsf{A2} \to \mathsf{A3} \to \mathsf{B4} \to \mathsf{B5} \to \mathsf{B6} \to \mathsf{A4}$$

To detect this bug, random walk has to make the correct choice at every step. Among all ten steps, three of them only have a single option: A2 and A3 must be executed first to enable B4, and A4 is the only statement left at the last step. Thus, the probability of reaching the bug is 1/2<sup>7</sup> = 1/128. As for PCT, we have

**Fig. 2.** The running example involving two threads.

to insert two preemption points just before statements B2 and A2 among ten statements, thus the probability for PCT is 1/<sup>10</sup> <sup>×</sup> <sup>1</sup>/<sup>10</sup> <sup>×</sup> <sup>1</sup>/2=1/200, where this 1/2 comes from the requirement that thread B has to be executed first.

In POS, this bug can be detected with a substantial probability of 1/48, much higher than other approaches. Indeed, our formal guarantees ensure that any behavior of this program can be covered with a probability of at least 1/60.

### **3 Preliminary**

*Concurrent Machine Model.* Our concurrent abstract machine models a *finite* set of processes and a set of shared objects. The machine state is denoted as s, which consists of the local state of each process and the state of shared objects. The abstract machine assumes the sequential consistency and allows the arbitrary interleaving among all processes. At each step, starting from s, any running process can be randomly selected to make a move to update the state to <sup>s</sup> and generate an event <sup>e</sup>, denoted as <sup>s</sup> <sup>e</sup> −→ <sup>s</sup> .

An event e is a tuple e := (pid, intr, obj, ind), where pid is the process ID, intr is the statement (or instruction) pointer, obj is the shared object accessed by this step (we assume each statement only access at most a single shared object), and ind indicates how many times this intr has been executed and is used to distinguish different runs of the same instruction. For example, the execution of the statement "A2: y++" in Fig. 2 will generate the event (A, A2, y, 0). Such an event captures the information of the corresponding step and can be used to replay the execution. In other words, given the starting state s and the event <sup>e</sup>, the resulting state <sup>s</sup> of a step " <sup>e</sup> −→" is determined.

A trace t is a list of events generated by a sequence of program transitions (or steps) starting from the initial machine state (denoted as s0). For example, the following program execution:

$$s\_0 \xrightarrow{e\_0} s\_1 \xrightarrow{e\_1} \cdots \xrightarrow{e\_n} s\_{n+1}$$

generates the trace <sup>t</sup> := <sup>e</sup><sup>0</sup> • <sup>e</sup><sup>1</sup> • ··· • <sup>e</sup>n, where the symbol " • " means"cons-ing" an event to the trace. Trace events can be accessed by index (e.g., t[1] = e1).

A trace can be used to replay a sequence of executions. In other words, given the initial machine state s<sup>0</sup> and the trace t, the resulting state of running t (denoted as "State(t)") is determined.

We write En(s) := {<sup>e</sup> | ∃s , s <sup>e</sup> −→ <sup>s</sup> } as the set of events *enabled* (or allowed to be executed) at state s. Take the program in Fig. 2 as an example. Initially, both A1 and B1 can be executed, and the corresponding two events form the enabled set En(s0). The blocking wait at B4, however, can be enabled only after being signaled at A3. A state <sup>s</sup> is called a *terminating* state if, and only if, En(s) = <sup>∅</sup>. We assume that any disabled event will eventually become enabled and every process must end with either a terminating state or an error state. This indicates that all traces are finite. For readability, we often abbreviate En(State(t)), i.e., the enabled event set after executing trace t, as En(t).

*Partial Order of Traces.* Two events e<sup>0</sup> and e<sup>1</sup> are called *independent* events (denoted as <sup>e</sup>0⊥e1) if, and only if, they neither belong to the same process nor access the same object:

$$e\_0 \bot e\_1 := (e\_0.\mathtt{pid} \neq e\_1.\mathtt{pid}) \land (e\_0.\mathtt{obj} \neq e\_1.\mathtt{obj})$$

The execution order of independent events does not affect the resulting state. If a trace t can be generated by swapping adjacent and independent events of another trace t , then these two traces t and t are *partial order equivalent*. Intuitively, partial order equivalent traces are guaranteed to lead the program to the same state. The *partial order* of a trace is characterized by the orders between all *dependent* events plus their *transitive closure*. Given a trace t, its partial order relation "<sup>t</sup>" is defined as the *minimal* relation over its events that satisfies:

(1) <sup>∀</sup>i j, i < j <sup>∧</sup> <sup>t</sup>[i] ⊥ <sup>t</sup>[j] =<sup>⇒</sup> <sup>t</sup>[i] <sup>t</sup> t[j] (2) <sup>∀</sup>i j k, t[i] <sup>t</sup> <sup>t</sup>[j] <sup>∧</sup> <sup>t</sup>[j] <sup>t</sup> <sup>t</sup>[k] =<sup>⇒</sup> <sup>t</sup>[i] <sup>t</sup> t[k]

Two traces with the same partial order relation and the same event set must be partial order equivalent.

Given an event order E and its order relation -<sup>E</sup> , we say a trace <sup>t</sup> follows <sup>E</sup> and write "<sup>t</sup> E" if, and only if,

$$\forall e\_0 \; e\_1, \; e\_0 \sqsubset\_t e\_1 \implies e\_0 \sqsubset\_\mathcal{E} e\_1$$

We write "<sup>t</sup> <sup>|</sup><sup>=</sup> <sup>E</sup>" to denote that <sup>E</sup> is exactly the partial order of trace <sup>t</sup>:

$$t \mid = \mathcal{E} := \quad \forall e\_0 \; e\_1, \; e\_0 \sqsubset\_t e\_1 \iff e\_0 \sqsubset\_{\mathcal{E}} e\_1$$

*Probabilistic Error-Detection Guarantees.* Each partial order of a concurrent program may lead to a different and potentially incorrect outcome. Therefore, any possible partial order has to be explored. The *minimum* probability of these explorations are called the probabilistic error-detection guarantee of a randomized scheduler.

Algorithm 1 presents a framework to formally reason about this guarantee. A sampling procedure Sample samples a terminating trace t of a program. It starts


1: **procedure** Sample(Sch, <sup>R</sup>) 2: t ← [ ] 3: **while** En(t) <sup>=</sup> <sup>∅</sup> **do** 4: <sup>e</sup> <sup>←</sup> Sch(En(t), <sup>R</sup>) 5: t ← t • e 6: **end while** 7: **return** <sup>t</sup> 8: **end procedure**

with the empty trace and repeatedly invokes a randomized scheduler (denoted as Sch) to append an event to the trace until the program terminates. The randomized scheduler Sch selects an enabled event from En(t) and the randomness comes from the random variable parameter, i.e., R.

A naive scheduler can be purely random without any strategy. A sophisticated scheduler may utilize additional information, such as the properties of the current trace and the enabled event set.

Given the randomized scheduler Sch on R and any partial order E of a program, we write "P(Sample(Sch, <sup>R</sup>) <sup>|</sup><sup>=</sup> <sup>E</sup>)" to denote the probability of covering E, i.e., generating a trace whose partial order is exactly E using Algorithm 1. The *probabilistic error-detection guarantee* of the scheduler Sch on R is then defined as the minimum probability of covering the partial order E of any terminating trace of the program:

min E <sup>P</sup>(Sample(Sch, <sup>R</sup>) <sup>|</sup><sup>=</sup> <sup>E</sup>)

### **4 POS - Algorithm and Analysis**

In this section, we first present BasicPOS, a priority-based scheduler and analyze its probability of covering a given partial order (see Sect. 4.1). Based on the analysis of BasicPOS, we then show that such a priority-based algorithm can be dramatically improved by introducing the *priority reassignment*, resulting in our POS algorithm (see Sect. 4.2). Finally, we present how to calculate the *probabilistic error-detection guarantee* of POS on general programs (see Sect. 4.3).

#### **4.1 BasicPOS**

In BasicPOS, each event is associated with a random and immutable priority, and, at each step, the enabled event with the highest priority will be picked to execute. We use Pri to denote the map from events to priorities and describe BasicPOS in Algorithm 2, which instantiates the random variable R in Algorithm 1 with Pri. The priority Pri(e) of every event e is independent with each other and follows the uniform distribution <sup>U</sup>(0, 1).

We now consider in what condition would BasicPOS sample a trace that follows a given partial order <sup>E</sup> of a program. It means that the generated trace <sup>t</sup>,


**Algorithm 2.** Sample a trace with BasicPOS under the priority map Pri

at the end of each loop iteration (line 5 in Algorithm 2), must satisfy the invariant "<sup>t</sup> E". Thus, the event priorities have to be properly ordered such that, given a trace <sup>t</sup> satisfies "<sup>t</sup> <sup>E</sup>", the enabled event <sup>e</sup><sup>∗</sup> with the highest priority must satisfies "<sup>t</sup> • <sup>e</sup><sup>∗</sup> <sup>E</sup>". In other words, given "<sup>t</sup> <sup>E</sup>", for any <sup>e</sup> <sup>∈</sup> En(t) and "<sup>t</sup> • <sup>e</sup> <sup>E</sup>", there must be some <sup>e</sup> <sup>∈</sup> En(t) satisfying "<sup>t</sup> • <sup>e</sup> E" and a proper priority map where e has a higher priority, i.e., Pri(e ) > Pri(e). Thus, e will not be selected as the event e<sup>∗</sup> at line 4 in Algorithm 2. The following Lemma 1 indicates that such an event e always exists:

### **Lemma 1**

$$\begin{array}{c} \forall t \ e, \ t \simeq \mathcal{E} \ \land \ e \in \mathsf{En}(t) \ \land \ t \bullet e \not\equiv \mathcal{E} \\ \implies \exists e', \ e' \in \mathsf{En}(t) \ \land \ t \bullet e' \simeq \mathcal{E} \ \land \ e' \sqsubseteq\_{\mathcal{E}} e' \end{array}$$

*Proof.* We can prove it by contradiction. Since traces are finite, we assume that some traces are counterexamples to the lemma and t is the longest such trace. In other words, we have <sup>t</sup> <sup>E</sup> and there exists <sup>e</sup> <sup>∈</sup> En(t) <sup>∧</sup> <sup>t</sup> • <sup>e</sup> E such that:

$$\forall e', \ e' \in \mathbf{En}(t) \land t \bullet e' \simeq \mathcal{E} \implies \neg(e' \sqsubset\_{\mathcal{E}} e) \tag{1}$$

Since <sup>E</sup> is the partial order of a terminating trace and the traces <sup>t</sup> has not terminated yet, we know that there must exist an event <sup>e</sup> <sup>∈</sup> En(t) such that <sup>t</sup> • <sup>e</sup> <sup>E</sup>. Let <sup>t</sup> := t • e , by (1), we have that <sup>¬</sup>(e -<sup>E</sup> <sup>e</sup>) and

$$\begin{array}{rcl} e \in \mathsf{En}(t')\\ \wedge \ t' \bullet e \not\subseteq \mathcal{E} \\ \wedge \ \forall e'', \ e'' \in \mathsf{En}(t') \ \wedge \ t' \bullet e'' \simeq \mathcal{E} \implies \neg(e'' \sqsubset\_{\mathcal{E}} e) \end{array}$$

First two statements are intuitive. The third one also holds, otherwise, e -E e can be implied by the transitivity of partial orders using e. Thus, t is a counterexample that is longer than <sup>t</sup>, contradicting to our assumption.

Thanks to Lemma 1, we then only need to construct a priority map such that this e has a higher priority. Let "e -<sup>E</sup> <sup>e</sup> := <sup>∃</sup>t, t E∧{e, e } ⊆ En(t)" denote that <sup>e</sup> and <sup>e</sup> can be *simultaneously enabled* under <sup>E</sup>. We write

$$\mathsf{PS}\_{\mathcal{E}}(e) := \{ e' \mid e' \sqsubset\_{\mathcal{E}} e \nrightarrow e \bowtie\_{\mathcal{E}} e' \},$$

as the set of events that can be simultaneously enabled with but have to be selected prior to <sup>e</sup> in order to follow <sup>E</sup>. We have that any <sup>e</sup> specified by Lemma <sup>1</sup> must belong to PS<sup>E</sup> (e). Let <sup>V</sup><sup>E</sup> be the event set ordered by <sup>E</sup>. The priority map Pri can be constructed as below:

$$\bigwedge\_{e \in V\varepsilon, \ e' \in \mathbb{PS}\varepsilon(e)} \mathbf{Pri}(e) < \mathbf{Pri}(e') \tag{Cond-BasicPOS}$$

The traces sampled by BasicPOS using this Pri will always follow E.

Although (Cond-BasicPOS) is not the necessary condition to sample a trace following a desired partial order, from our observation, it gives a good estimation for the worst cases. This leads us to locate the major weakness of BasicPOS: the *constraint propagation* of priorities. An event <sup>e</sup> with a large PS<sup>E</sup> (e) set may have a relatively low priority since its priority has to be lower than all the events in PS<sup>E</sup> (e). Thus, for any simultaneously enabled event <sup>e</sup> that has to be delayed after e, Pri(e ) must be even smaller than Pri(e), which is unnecessarily hard to satisfy for a random Pri(e ). Due to this constraints propagation, the probability that a priority map Pri satisfies (Cond-BasicPOS) can be as low as 1/|V<sup>E</sup> <sup>|</sup>!.

Here, we explain how BasicPOS samples the following trace that triggers the bug described in Sect. 2:

$$\begin{array}{c} t\_{bug} := (\mathsf{B}, \mathsf{B1}, \mathsf{x}, \mathsf{0}) \bullet (\mathsf{A}, \mathsf{A1}, \mathsf{x}, \mathsf{0}) \bullet (\mathsf{B}, \mathsf{B2}, \mathsf{x}, \mathsf{0}) \bullet (\mathsf{B}, \mathsf{B3}, \mathsf{y}, \mathsf{0}) \bullet (\mathsf{A}, \mathsf{A2}, \mathsf{y}, \mathsf{0}) \\\ \bullet (\mathsf{A}, \mathsf{A3}, \mathsf{w}, \mathsf{0}) \bullet (\mathsf{B}, \mathsf{B4}, \mathsf{w}, \mathsf{0}) \bullet (\mathsf{B}, \mathsf{B5}, \mathsf{y}, \mathsf{0}) \bullet (\mathsf{B}, \mathsf{B6}, \mathsf{z}, \mathsf{0}) \bullet (\mathsf{A}, \mathsf{A4}, \mathsf{z}, \mathsf{0}) \end{array}$$

To sample trace t*bug* , according to (Cond-BasicPOS), the priority map has to satisfy the following constraints:

$$\begin{array}{l} \mathbf{Pri}(t\_{bug}[0] = (\mathbf{B}, \mathbf{B1}, \mathbf{x}, \mathbf{0})) & > \mathbf{Pri}(t\_{bug}[1] = (\mathbf{A}, \mathbf{A1}, \mathbf{x}, \mathbf{0})) \\ \mathbf{Pri}(t\_{bug}[1]) & > \mathbf{Pri}(t\_{bug}[2] = (\mathbf{B}, \mathbf{B2}, \mathbf{x}, \mathbf{0})) \\ \mathbf{Pri}(t\_{bug}[2]) & > \mathbf{Pri}(t\_{bug}[4] = (\mathbf{A}, \mathbf{A2}, \mathbf{y}, \mathbf{0})) \\ \mathbf{Pri}(t\_{bug}[3] = (\mathbf{B}, \mathbf{B3}, \mathbf{y}, \mathbf{0})) & > \mathbf{Pri}(t\_{bug}[4]) \\ \mathbf{Pri}(t\_{bug}[6] = (\mathbf{B}, \mathbf{B4}, \mathbf{w}, \mathbf{0})) & > \mathbf{Pri}(t\_{bug}[9] = (\mathbf{A}, \mathbf{A4}, \mathbf{z}, \mathbf{0})) \\ \mathbf{Pri}(t\_{bug}[7] = (\mathbf{B}, \mathbf{B5}, \mathbf{y}, \mathbf{0})) & > \mathbf{Pri}(t\_{bug}[9]) \\ \mathbf{Pri}(t\_{bug}[8] & = (\mathbf{B}, \mathbf{B6}, \mathbf{z}, \mathbf{0})) & > \mathbf{Pri}(t\_{bug}[9]) \end{array}$$

Note that these are also the necessary constraints for BasicPOS to follow the partial order of t*bug* . The probability that a random Pri satisfies the constraints is 1/120. The propagation of the constraints can be illustrated by the first three steps:

$$\mathbf{Pri}(t\_{bug}[0]) > \mathbf{Pri}(t\_{bug}[1]) > \mathbf{Pri}(t\_{bug}[2]) > \mathbf{Pri}(t\_{bug}[4])$$

that happens in the probability of 1/24. However, on the other hand, random walk can sample these three steps in the probability of 1/8.

### **4.2 POS**

We will now show how to improve BasicPOS by eliminating the propagation of priority constraints. Consider the situation when an event e (delayed at some trace t) becomes eligible to schedule right after scheduling some e , i.e.,

$$t \simeq \mathcal{E} \land \{e, e'\} \subseteq \mathsf{En}(t) \land t \bullet e \not\simeq \mathcal{E} \land \newline t \bullet e' \bullet e \simeq \mathcal{E} \land$$

If we reset the priority of e right after scheduling e , all the constraints causing the delay of <sup>e</sup> will not be propagated to the event <sup>e</sup> such that <sup>e</sup> <sup>∈</sup> PS<sup>E</sup> (e). However, there is no way for us to know which e should be reset after e during the sampling, since E is unknown and not provided. Notice that

$$t \simeq \mathcal{E} \land \{e, e'\} \subseteq \mathsf{En}(t) \land t \bullet e \not\simeq \mathcal{E} \land \ t \bullet e' \bullet e \simeq \mathcal{E} \implies e.\mathsf{obj} = e'.\mathsf{obj}$$

If we reset the priority of all the events that access the same object with e , the propagation of priority constraints will also be eliminated.

To analyze how POS works to follow E under the reassignment scheme, we have to model how many priorities need to be reset at each step. Note that blindly reassigning priorities of all delayed events at each step would be suboptimal, which degenerates the algorithm to random walk. To give a formal and more precise analysis, we introduce the object index functions for trace t and partial order E:

$$\begin{array}{lcl} \mathbf{1}(t,e) :=& \left| \{ e' \mid e' \in t \land e.\mathbf{obj} = e'.\mathbf{obj} \} \right| \\ \mathbf{1}\_{\mathcal{E}}(e) :=& \left| \{ e' \mid e' \sqsubseteq\_{\mathcal{E}} e \land e.\mathbf{obj} = e'.\mathbf{obj} \} \right| \end{array}$$

Intuitively, when <sup>e</sup> <sup>∈</sup> En(t), scheduling <sup>e</sup> on <sup>t</sup> will operate e.obj after <sup>I</sup>(t, e) previous events. A trace <sup>t</sup> follows <sup>E</sup> if every step (indicated by <sup>t</sup>[i]) operates the object <sup>t</sup>[i].obj after <sup>I</sup><sup>E</sup> (t[i]) previous events in the trace.

We then index (or version) the priority of event e using the index function as Pri(e, I(t, e)) and introduce POS shown in Algorithm 3. By proving that

$$\forall e', \ \mathbf{1}(t, e) \le \mathbf{1}(t \bullet e', e) \land \ (\mathbf{1}(t, e) = \mathbf{1}(t \bullet e', e) \iff e.\mathbf{obj} \ne e'.\mathbf{obj})$$

we have that scheduling an event e will *increase* the priority version of all the events accessing e.obj, resulting in the priority reassignment.

We can then prove that the following statements hold:

$$\begin{array}{c} \forall t \ e, \ t \simeq \mathcal{E} \land e \in \mathbf{En}(t) \implies (t \bullet e \simeq \mathcal{E} \iff \mathbf{I}(t, e) = \mathbf{I}\_{\mathcal{E}}(e)) \\\forall t \ e, \ t \simeq \mathcal{E} \land e \in \mathbf{En}(t) \land t \bullet e \not\simeq \mathcal{E} \implies \mathbf{I}(t, e) < \mathbf{I}\_{\mathcal{E}}(e) \end{array}$$

To ensure that the selection of <sup>e</sup><sup>∗</sup> on trace <sup>t</sup> follows <sup>E</sup> at the line 4 of Algorithm 3, any <sup>e</sup> satisfying <sup>I</sup>(t, e) <sup>&</sup>lt; <sup>I</sup><sup>E</sup> (e) has to have a smaller priority than some <sup>e</sup> satisfying I(t, e ) = <sup>I</sup><sup>E</sup> (e) and such <sup>e</sup> must exist by Lemma 1. In this way, the priority constraints for POS to sample E are as below:

$$\bigwedge \mathbf{Pri}(e, i) < \mathbf{Pri}(e', \mathbf{1}\_{\mathcal{E}}(e')) \text{ for some } i < \mathbf{1}\_{\mathcal{E}}(e)$$

which is bipartite and the propagation of priority constraints is eliminated. The effectiveness of POS is guaranteed by Theorem 1.

Pri ∼ U(0, 1)


1: **procedure** SamplePOS(Pri) -2: t ← [ ] 3: **while** En(t) <sup>=</sup> <sup>∅</sup> **do** 4: <sup>e</sup><sup>∗</sup> <sup>←</sup> arg max*<sup>e</sup>*∈En(*t*) Pri(e, <sup>I</sup>(t, e)) 5: t ← t.e<sup>∗</sup> 6: **end while** 7: **return** <sup>t</sup> 8: **end procedure**

**Theorem 1.** *Given any partial order* <sup>E</sup> *of a program with* <sup>P</sup> <sup>&</sup>gt; <sup>1</sup> *processes. Let*

<sup>D</sup><sup>E</sup> := | {(e, e ) <sup>|</sup> <sup>e</sup> -<sup>E</sup> <sup>e</sup> <sup>∧</sup> <sup>e</sup> ⊥ <sup>e</sup> <sup>∧</sup> e -<sup>E</sup> <sup>e</sup> } |

*be the number of races in* E*, we have that*


$$\left(\frac{1}{\mathcal{P}}\right)^{|V\_{\mathcal{E}}|} R^U$$

*where* <sup>R</sup> <sup>=</sup> P×|V<sup>E</sup> <sup>|</sup>/(|V<sup>E</sup> <sup>|</sup> <sup>+</sup> <sup>D</sup><sup>E</sup> ) <sup>≥</sup> <sup>1</sup> *and* <sup>U</sup> = (|V<sup>E</sup> |−D<sup>E</sup> /(P − 1))/<sup>2</sup> <sup>≥</sup> <sup>0</sup>

Please refer to the technical report [33] for the detailed proof and the construction of priority constraints.

Here, we show how POS improves BasicPOS over the example in Sect. 2. The priority constraints for POS to sample the partial order of t*bug* are as below:

Pri(t*bug* [0] , 0) > Pri(t*bug* [1] , 0) Pri(t*bug* [1] , 1) > Pri(t*bug* [2] , 1) Pri(t*bug* [2] , 2) > Pri(t*bug* [4] , 0) Pri(t*bug* [3] , 0) > Pri(t*bug* [4] , 0) Pri(t*bug* [6] , 1) > Pri(t*bug* [9] , 0) Pri(t*bug* [7] , 2) > Pri(t*bug* [9] , 0) Pri(t*bug* [8] , 0) > Pri(t*bug* [9] , 0)

Since each Pri(e, i) is independently random following <sup>U</sup>(0, 1), the probability of Pri satisfying the constraints is 1/<sup>2</sup> <sup>×</sup> <sup>1</sup>/<sup>2</sup> <sup>×</sup> <sup>1</sup>/<sup>3</sup> <sup>×</sup> <sup>1</sup>/4=1/48.

### **4.3 Probability Guarantee of POS on General Programs**

We now analyze how POS performs on general programs compared to random walk and PCT. Consider a program with P processes and N total events. It is generally common for a program have substantial non-racing events, for example, accessing shared variables protected by locks, semaphores, and condition variables, etc. We assume that there exists a ratio 0 <sup>≤</sup> <sup>α</sup> <sup>≤</sup> 1 such that in any partial order there are at least <sup>α</sup><sup>N</sup> non-racing events.

Under this assumption, for random walk, we can construct an adversary program with the worst case probability as 1/P<sup>N</sup> for almost any <sup>α</sup> [33]. For PCT, since only the order of the (1 <sup>−</sup> <sup>α</sup>)<sup>N</sup> events may affect the partial order, the number of preemptions needed for a partial order in the worst case becomes (1 <sup>−</sup> <sup>α</sup>)<sup>N</sup> , and thus the worst case probability bound is 1/<sup>N</sup> (1−α)<sup>N</sup> . For POS, the number of races <sup>D</sup><sup>E</sup> is reduced to (1 <sup>−</sup> <sup>α</sup>)N × (P − 1) in the worst case, Theorem 1 guarantees the probability lower bound as

$$\frac{1}{\mathcal{P}^{\mathcal{N}}} \left( \frac{1}{1 - (1 - 1/P)\alpha} \right)^{\alpha \mathcal{N}/2}$$

Thus, POS advantages random walk when α > 0 and degenerates to random walk when <sup>α</sup> = 0. Also, POS advantages PCT if <sup>N</sup> <sup>&</sup>gt; <sup>P</sup> (when <sup>α</sup> = 0) or <sup>N</sup> <sup>1</sup>/α−<sup>1</sup> <sup>&</sup>gt; <sup>P</sup><sup>1</sup>/α1 + α/P − <sup>α</sup> (when 0 <α< 1). For example, when <sup>P</sup> = 2 and <sup>α</sup> = 1/2, POS advantages PCT if <sup>N</sup> <sup>&</sup>gt; <sup>2</sup> <sup>√</sup>3. In other words, in this case, POS is better than PCT if there are at least four total events.

# **5 Implementation**

The algorithm of POS requires a pre-determined priority map, while the implementation could decide the event priority on demand when new events appear. The implementation of POS is shown in Algorithm 4, where lines 14–18 are for the priority reassignment. Variable s represents the current program state with the following interfaces:


In the algorithm, if a race is detected during the scheduling, the priority of the delayed event in the race will be removed and then be reassigned at lines 6–9.

*Relaxation for Read-Only Events.* The abstract interface s.IsRacing(...) allows us to relax our model for read-only events. When both e and e are read-only events, s.IsRacing(e, e ) returns false even if they are accessing the same object. Our evaluations show that this relaxation improves the execution time of POS.

*Fairness Workaround.* POS is probabilistically fair. For an enabled event e with priority p > 0, the cumulative probability for <sup>e</sup> to delay by <sup>k</sup> → ∞ steps without racing is at most (1−p<sup>P</sup> )<sup>k</sup> <sup>→</sup> 0. However, it is possible that POS delays events for prolonged time, slowing down the test. To alleviate this, the current implementation resets all event priorities for every 10<sup>3</sup> voluntary context switch events, e.g., sched yield() calls. This is only useful for speeding up few benchmark programs that have busy loops (sched yield() calls were added by SCTBench creators) and has minimal impact on the probability of hitting bugs.

```
1: procedure POS(s) -
                                     s: the initial state of the program
2: pri ← [ → −∞] -
                  Initially, no priority is assigned except the special symbol 
3: while s.Enabled() = ∅ do
4: e∗ ←  -
                                           Assume  /∈ s.Enabled()
5: for each e ∈ s.Enabled() do
6: if e /∈ pri then
7: newP riority ← U(0, 1)
8: pri ← pri[e → newP riority]
9: end if
10: if pri(e∗) < pri(e) then
11: e∗ ← e
12: end if
13: end for
14: for each e ∈ s.Enabled() do -
                                                Update priorities
15: if e = e∗ ∧ s.IsRacing(e, e∗) then
16: pri ← pri \ {e} -
                          The priority will be reassigned in the next step
17: end if
18: end for
19: s ← s.Execute(e∗)
20: end while
21: return s
22: end procedure
```
### **Algorithm 4.** Testing a program with POS

# **6 Evaluation**

To understand the performance of POS and compare with other sampling methods, we conducted experiments on both micro benchmarks (automatically generated) and macro benchmarks (including real-world programs).

### **6.1 Micro Benchmark**

We generated programs with a small number of static events as the micro benchmarks. We assumed multi-threaded programs with t threads and each thread executes m events accessing o objects. To make the program space tractable, we chose t = m = o = 4, resulting 16 total events. To simulate different object access patterns in real programs, we chose to randomly distribute events accessing different objects with the following configurations:


The results are shown in Table 1. The benchmark columns show the characteristics of each generated program, including (1) the configuration used for generating the program; (2) the number of distinct partial orders in the program; (3) the maximum number of preemptions needed for covering all partial orders; and (4) the maximum number of races in any partial order. We measured the



**Table 2.** Coverage on the micro benchmark programs - 50% read


coverage of each sampling method on each program by the minimum hit ratio on any partial order of the program. On every program, we ran each sampling methods for 5×10<sup>7</sup> times (except for random walk, for which we calculated the exact probabilities). If a program was not fully covered by an algorithm within the sample limit, the coverage is denoted as "0(x)", where x is the number of covered partial orders. We let PCT sample the exact number of the preemptions needed for each case. We tweaked PCT to improve its coverage by adding a dummy event at the beginning of each thread, as otherwise PCT cannot preempt the actual first event of each thread. The results show that POS performed the best among all algorithms. For each algorithm, we calculated the overall performance as the geometric mean of the coverage.<sup>1</sup> POS overall performed <sup>∼</sup>7.0<sup>×</sup> better compared to other algorithms (∼134.1<sup>×</sup> excluding RAPOS and BasicPOS).

To understand our relaxation of read-only events, we generated another set of programs with the same configurations, but with half of the events read-only. The results are shown in Table 2, where the relaxed algorithm is denoted as POS∗. Overall, POS<sup>∗</sup> performed roughly <sup>∼</sup>1.4<sup>×</sup> as good as POS and <sup>∼</sup>5.0<sup>×</sup> better compared to other algorithms (∼226.4<sup>×</sup> excluding RAPOS and BasicPOS).

### **6.2 Macro Benchmark**

We used SCTBench [24], a collection of concurrency bugs on multi-threaded programs, to evaluate POS on practical programs. SCTBench collected 49 concurrency bugs from previous parallel workloads [3,27] and concurrency testing/verification work [4,6,18,21,31]. SCTBench comes with a concurrency testing tool, Maple [32], which intercepts pthread primitives and shared memory accesses, as well as controls their interleaving. When a bug is triggered, it will be caught by Maple and reported back. We implemented POS with the relaxation of read-only events in Maple. Each sampling method was evaluated in SCTBench by the ratio of tries and hits of the bug in each case. For each case, we ran each sampling method on it until the number of tries reaches 10<sup>4</sup>. We recorded the bug hit count h and the total runs count t, and calculated the ratio as h/t.

Two cases in SCTBench are not adopted: parsec-2.0-streamcluster2 and radbench-bug1. Because neither of the algorithms can hit their bugs once, which conflicts with previous results. We strengthened the case safestack-bug1 by internally repeating the case for 10<sup>4</sup> times (and shrunk the run limit to 500). This amortizes the per-run overhead of Maple, which could take up to a few seconds. We modified PCT to reset for every internal loop. We evaluated variants of PCT algorithms of PCT-d, representing PCT with <sup>d</sup>−1 preemption points, to reduce the disadvantage of a sub-optimal d. The results are shown in Table 3. We ignore cases in which all algorithms can hit the bugs with more than half of their tries. The cases are sorted based on the minimum hit ratio across algorithms. The performance of each algorithm is aggregated by calculating the geometric mean of hit ratios<sup>2</sup> on every case. The best hit ratio for each case is marked as blue.

The results of macro benchmark experiments can be highlighted as below:

– Overall, POS performed the best in hitting bugs in SCTBench. The geometric mean of POS is <sup>∼</sup>2.6<sup>×</sup> better than PCT and <sup>∼</sup>4.7<sup>×</sup> better than random walk. Because the buggy interleavings in each case are not necessarily the most

<sup>1</sup> For each case that an algorithm does not have the full coverage, we conservatively account the coverage as <sup>1</sup>

<sup>5</sup>×10<sup>7</sup> into the geometric mean. <sup>2</sup> For each case that an algorithm cannot hit once within the limit, we conservatively account the hit ratio as 1/t in the calculation of the geometric mean.

difficult ones to sample, POS may not perform overwhelmingly better than others, as in micro benchmarks.



**Table 3.** Bug hit ratios on macro benchmark programs

# **7 Conclusion**

We have presented POS, a concurrency testing approach to sample the partial order of concurrent programs. POS's core algorithm is simple and lightweight: (1) assign a random priority to each event in a program; (2) repeatedly execute the event with the highest priority; and (3) after executing an event, reassign its racing events with random priorities. We have formally shown that POS has an exponentially stronger probabilistic error-detection guarantee than existing randomized scheduling algorithms. Evaluations have shown that POS is effective in covering the partial-order space of micro-benchmarks and finding concurrency bugs in real-world programs such as Firefox's JavaScript engine SpiderMonkey.

**Acknowledgements.** We thank the anonymous reviewers for their helpful feedbacks that greatly improved this paper. We thank Madan Musuvathi for insightful discussions. This research was supported in part by NSF CNS-1564055, ONR N00014-16-1- 2263, and ONR N00014-17-1-2788 grants.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Reasoning About TSO Programs Using Reduction and Abstraction**

Ahmed Bouajjani<sup>1</sup>, Constantin Enea<sup>1</sup>, Suha Orhun Mutluergil2(B) , and Serdar Tasiran<sup>3</sup>

> <sup>1</sup> IRIF, University Paris Diderot and CNRS, Paris, France {abou,cenea}@irif.fr <sup>2</sup> Koc University, Istanbul, Turkey smutluergil@ku.edu.tr <sup>3</sup> Amazon Web Services, New York, USA tasirans@amazon.com

**Abstract.** We present a method for proving that a program running under the Total Store Ordering (TSO) memory model is robust, i.e., all its TSO computations are equivalent to computations under the Sequential Consistency (SC) semantics. This method is inspired by Lipton's reduction theory for proving atomicity of concurrent programs. For programs which are not robust, we introduce an abstraction mechanism that allows to construct robust programs over-approximating their TSO semantics. This enables the use of proof methods designed for the SC semantics in proving invariants that hold on the TSO semantics of a non-robust program. These techniques have been evaluated on a large set of benchmarks using the infrastructure provided by CIVL, a generic tool for reasoning about concurrent programs under the SC semantics.

# **1 Introduction**

A classical memory model for shared-memory concurrency is Sequential Consistency (SC) [16], where the actions of different threads are interleaved while the program order between actions of each thread is preserved. For performance reasons, modern multiprocessors implement weaker memory models, e.g., Total Store Ordering (TSO) [19] in x86 machines, which relax the program order. For instance, the main feature of TSO is the write-to-read relaxation, which allows reads to overtake writes. This relaxation reflects the fact that writes are buffered before being flushed non-deterministically to the main memory.

Nevertheless, most programmers usually assume that memory accesses happen instantaneously and atomically like in the SC memory model. This assumption is safe for data-race free programs [3]. However, many programs employing lock-free synchronization are not data-race free, e.g., programs implementing synchronization operations and libraries implementing concurrent objects. In

This work is supported in part by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 678177).

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 336–353, 2018. https://doi.org/10.1007/978-3-319-96142-2\_21

most cases, these programs are designed to be robust against relaxations, i.e., they admit the same behaviors as if they were run under SC. Memory fences must be included appropriately in programs in order to prevent non-SC behaviors. Getting such programs right is a notoriously difficult and error-prone task. Robustness can also be used as a proof method, that allows to reuse the existing SC verification technology. Invariants of a robust program running under SC are also valid for the TSO executions. Therefore, the problem of checking robustness of a program against relaxations of a memory model is important.

In this paper, we address the problem of checking robustness in the case of TSO. We present a methodology for proving robustness which uses the concepts of left/right mover in Lipton's reduction theory [17]. Intuitively, a program statement is a left (resp., right) mover if it commutes to the left (resp., right) with respect to the statements in the other threads. These concepts have been used by Lipton [17] to define a program rewriting technique which enlarges the atomic blocks in a given program while preserving the same set of behaviors. In essence, robustness can also be seen as an atomicity problem: every write statement corresponds to two events, inserting the write into the buffer and flushing the write from the buffer to the main memory, which must be proved to happen atomically, one after the other. However, differently from Lipton's reduction theory, the events that must be proved atomic do not correspond syntactically to different statements in the program. This leads to different uses of these concepts which cannot be seen as a direct instantiation of this theory.

In case programs are not robust, or they cannot be proven robust using our method, we define a program abstraction technique that roughly, makes reads non-deterministic (this follows the idea of combining reduction and abstraction introduced in [12]). The non-determinism added by this abstraction can lead to programs which can be proven robust using our method. Then, any invariant (safety property) of the abstraction, which is valid under the SC semantics, is also valid for the TSO semantics of the original program. As shown in our experiments, this abstraction leads in some cases to programs which reach exactly the same set of configurations as the original program (but these configurations can be reached in different orders), which implies no loss of precision.

We tested the applicability of the proposed reduction and abstraction based techniques on an exhaustive benchmark suite containing 34 challenging programs (from [2,7]). These techniques were precise enough for proving robustness of 32 of these programs. One program (presented in Fig. 3) is not robust, and required abstraction in order to derive a robust over-approximation. There is only one program which cannot be proved robust using our techniques (although it is robust). We believe however that an extension of our abstraction mechanism to atomic read-write instructions will be able to deal with this case. We leave this question for future work.

An extended version of this paper with missing proofs can be found at [8].

# **2 Overview**

The TSO memory model allows strictly more behaviors than the classic SC memory model: writes are first stored in a thread-local buffer and

**Fig. 1.** An example message passing program and a sample trace. Edges of the trace shows the happens before order of global accesses and they are simplified by applying transitive reduction.

non-deterministically flushed into the shared memory at a later time (also, the write buffers are accessed first when reading a shared variable). However, in practice, many programs are *robust*, i.e., they have exactly the same behaviors under TSO and SC. Robustness implies for instance, that any invariant proved under the SC semantics is also an invariant under the TSO semantics. We describe in the following a sound methodology for checking that a program is *robust*, which avoids modeling and verifying TSO behaviors. Moreover, for non-robust programs, we show an abstraction mechanism that allows to obtain robust programs over-approximating the behaviors of the original program.

As a first example, consider the simple "message passing" program in Fig. 1. The send method sets the value of the "communication" variable y to some predefined value from register r1. Then, it raises a flag by setting the variable x to 1. Another thread executes the method recv which waits until the flag is set and then, it reads y (and stores the value to register r2). This program is robust, TSO doesn't enable new behaviors although the writes may be delayed. For instance, consider the following TSO execution (we assume that r1 = 42):

$$\begin{aligned} \langle t\_1, isu \rangle \quad & \quad \langle t\_1, isu \rangle \langle t\_1, com, y, 42 \rangle \quad & \quad \langle t\_1, com, x, 1 \rangle\\ \langle t\_2, rd, x, 0 \rangle \quad & \quad \langle t\_2, rd, x, 0 \rangle \quad & \quad \langle t\_2, rd, x, 1 \rangle \langle t\_2, rd, y, 42 \rangle \end{aligned}$$

The actions of each thread (t<sup>1</sup> or t2) are aligned horizontally, they are either *issue* actions (isu) for writes being inserted into the local buffer (e.g., the first (t1, isu) represents the write of y being inserted to the buffer), *commit* actions (com) for writes being flushed to the main memory (e.g., (t1, com, y, 42) represents the write y := 42 being flushed and executed on the shared memory), and *read* actions for reading values of shared variables. Every assignment generates two actions, an issue and a commit. The issue action is "local", it doesn't enable or disable actions of other threads.

The above execution can be "mimicked" by an SC execution. If we had not performed the isu actions of t<sup>1</sup> that early but delayed them until just before their corresponding com actions, we would obtain a valid SC execution of the same program with no need to use store buffers:

(*t*1*, wr, y,* 42) (*t*1*, wr, x,* 1) (*t*2*, rd, x,* 0) (*t*2*, rd, x,* 0) (*t*2*, rd, x,* 1)(*t*2*, rd, y,* 42)

Above, consecutive isu and com actions are combined into a single *write* action (wr). This intuition corresponds to an equivalence relation between TSO executions and SC executions: if both executions contain the same actions on the shared variables (performing the same accesses on the same variables with the same values) and the order of actions on the same variable are the same for both executions, we say that these executions have the same *trace* [20], or that they are *trace-equivalent*. For instance, both the SC and TSO executions given above have the same trace given in Fig. 1. The notion of trace is used to formalize robustness for programs running under TSO [7]: a program is called *robust* when every TSO execution has the same trace as an SC execution.

Our method for showing robustness is based on proving that every TSO execution can be permuted to a trace-equivalent SC execution (where issue actions are immediately followed by the corresponding commit actions). We say that an action α moves right until another action β in an execution if we can swap α with every later action until β while preserving the feasibility of the execution (e.g., not invalidating reads and keeping the actions enabled). We observe that if α moves right until β then the execution obtained by moving α just before β has the same trace with the initial execution. We also have the dual notion of moves-left with a similar property. As a corollary, if every issue action moves right until the corresponding commit action or every commit action moves left until the corresponding issue action, we can find an equivalent SC execution. For our execution above, the issue actions of the first thread move right until their corresponding com actions. Note that there is a commit action which doesn't move left: moving (t1, com, x, 1) to the left of (t2, rd, x, 0) is not possible since it would disable this read.

In general, issue actions and other thread local actions (e.g. statements using local registers only) move right of other threads' actions. Moreover, issue actions (t, isu) move right of commit actions of the same thread that correspond to writes issued before (t, isu). For the message passing program, the issue actions move right until their corresponding commits in all TSO executions since commits cannot be delayed beyond actions of the same thread (for instance reads). Hence, we can safely deduce that the message passing program is robust. However, this reasoning may fail when an assignment is followed by a read of a shared variable in the same thread.

**Fig. 2.** An example store buffering program.

Consider the "store-buffering" like program in Fig. 2. This program is also robust. However, the issue action generated by x := 1 might not always move right until the corresponding commit. Consider the following execution (we assume that initially, z = 5):

$$\begin{array}{cccc} (t\_1, isu) & (t\_1, rd, z, 5) & \cdots & (t\_1, com, x, 1) \dots \\ (t\_2, isu) & (t\_2, com, y, 1)(t\_2, \tau)(t\_2, rd, x, 0) & \cdots \end{array}$$

Here, we assumed that t<sup>1</sup> executes foo and t<sup>2</sup> executes bar. The fence instruction generates an action τ . The first issue action of t<sup>1</sup> cannot be moved to the right until the corresponding commit action since this would violate the program order. Moreover, the corresponding commit action does not move left due to the read action of t<sup>2</sup> on x (which would become infeasible).

The key point here is that a later read action by the same thread, (t1, rd, z, 5), doesn't allow to move the issue action to the right (until the commit). However, this read action moves to the right of other threads actions. So, we can construct an equivalent SC execution by first moving the read action right after the commit (t1, com, x, 1) and then move the issue action right until the commit action.

In general, we say that an issue (t, isu) of a thread t moves right until the corresponding commit if each read action of t after (t, isu) can move right until the next action of t that follows both the read and the commit. Actually, this property is not required for all such reads. The read actions that follow a fence cannot happen between the issue and the corresponding commit actions. For instance, the last read action of foo cannot happen between the first issue of foo and its corresponding commit action. Such reads that follow a fence are not required to move right. In addition, we can omit the right-moves check for read actions that read from the thread local buffer (see Sect. 3 for more details).

In brief, our method for checking robustness does the following for every write instruction (assignment to a shared variable): either the commit action of this write moves left or the actions of later read instructions that come before a fence move right in all executions. This semantic condition can be checked using the concept of movers [17] as follows: every write instruction is either a left-mover or all the read instructions that come before a fence and can be executed later than the write (in an SC execution) are right-movers. Note that this requires no modeling and verification of TSO executions.

For non-robust programs that might reach different configurations under TSO than under SC, we define an abstraction mechanism that replaces read instructions with "non-deterministic" reads that can read more values than the original instructions. The abstracted program has more behaviors than the original one (under both SC and TSO), but it may turn to be robust. When it is robust, we get that any property of its SC semantics holds also for the TSO semantics of the original program.

Consider the work stealing queue implementation in Fig. 3. A queue is represented with an array items. Its head and tail indices are stored in the shared variables H and T, respectively. There are three procedures that can operate on this queue: any number of threads may execute the steal method and remove an element from the head of the queue, and a single unique thread may execute put or take methods nondeterministically. The put method inserts an element at the tail index and the take method removes an element from the tail index.

**Fig. 3.** Work stealing queue.

This program is not robust. If there is a single element in the queue and the take method takes it by delaying its writes after some concurrent steals, one of the concurrent steals might also remove this last element. Popping the same element twice is not possible under SC, but it is possible under TSO semantics. However, we can still prove some properties of this program under TSO. Our robustness check fails on this program because the writes of the worker thread (executing the put and take methods) are not left movers and the read from the variable H in the take method is not a right mover. This read is not a right mover w.r.t. successful CAS actions of the steal procedure that increment H.

We apply an abstraction on the instruction of the take method that reads from H such that instead of reading the exact value of H, it can read any value less than or equal to the value of H. We write this instruction as havoc(h, h ≤ H) (it assigns to h a nondeterministic value satisfying the constraint h ≤ H). Note that this abstraction is sound in the sense that it reaches more states under SC/TSO than the original program.

The resulting program is robust. The statement havoc(h, h ≤ H) is a right mover w.r.t. successful CAS actions of the stealer threads. Hence, for all the write instructions, the reachable read instructions become right movers and our check succeeds. The abstract program satisfies the specification of an idempotent work stealing queue (elements can be dequeued multiple times) which implies that the original program satisfies this specification as well.

### **3 TSO Robustness**

We present the syntax and the semantics of a simple programming language used to state our results. We define both the TSO and the SC semantics, an

**Fig. 4.** Syntax of the programs. The star (∗) indicates zero or more occurrences of the preceding element. *pid*, *tid*, *var*, *reg* and *label* are elements of their given domains representing the program identifiers, thread identifiers, shared variables, registers and instruction labels, respectively. *expr* is an arithmetic expression over *reg* ∗. Similarly, *bexpr* is a boolean expression over *reg* ∗.

abstraction of executions called *trace* [20] that intuitively, captures the happensbefore relation between actions in an execution, and the notion of robustness.

**Syntax.** We consider a simple programming language which is defined in Fig. 4. Each program <sup>P</sup> has a finite number of shared variables −→<sup>x</sup> and a finite number of threads (−→t ). Also, each thread t<sup>i</sup> has a finite set of local registers (−→r<sup>i</sup> ) and a start label l 0 <sup>i</sup> . Bodies of the threads are defined as finite sequences of labelled instructions. Each instruction is followed by a goto statement which defines the evolution of the program counter. Note that multiple instructions can be assigned to the same label which allows us to write non-deterministic programs and multiple goto statements can direct the control to the same label which allows us to mimic imperative constructs like loops and conditionals. An assignment to a shared variable var := expr is called a *write instruction*. Also, an instruction of the form reg := var is called a *read instruction*.

Instructions can read from or write to shared variables or registers. Each instruction accesses at most one shared variable. We assume that the program P comes with a domain D of values that are stored in variables and registers, and a set of functions F used to calculate arithmetic and boolean expressions.

The fence statement empties the buffer of the executing thread. The cas (compare-and-swap) instruction checks whether the value of its input variable is equal to its second argument. If so, it writes sets third argument as the value of the variable and returns true. Otherwise, it returns f alse. In either case, cas empties the buffer immediately after it executes. The assume statement allows us to check conditions. If the boolean expression it contains holds at that state, it behaves like a skip. Otherwise, the execution blocks. Formal description of the instructions are given in Fig. 5.


**Fig. 5.** The TSO transition relation. The function *ins* takes a label *<sup>l</sup>* and returns the set of instructions labelled by *<sup>l</sup>*. We always assume that *<sup>x</sup>* <sup>∈</sup> −→*<sup>x</sup>* , *<sup>r</sup>* <sup>∈</sup> −→*r<sup>t</sup>* and *pc* <sup>=</sup> *pc*[*<sup>t</sup>* <sup>→</sup> *<sup>l</sup>* ] where *pc*(*t*) : *inst goto l* ; is a labelled instruction of *t* and *inst* is the instruction described at the beginning of the rule. The evaluation function *eval* calculates the value of an arithmetic or boolean expression based on *mem* (*ae* stands for arithmetic expression). Sequence concatenation is denoted by ◦. The function *varsOfBuf* takes a sequence of pairs and returns the set consisting of the first fields of these pairs.

**TSO Semantics.** Under the TSO memory model, each thread maintains a local queue to buffer write instructions. A state s of the program is a triple of the form (pc, mem, buf). Let L be the set of available labels in the program P. Then, pc : −→<sup>t</sup> → L shows the next instruction to be executed for each thread, mem : - <sup>t</sup>*i*∈→−<sup>t</sup> −→r<sup>i</sup> <sup>∪</sup> −→<sup>x</sup> → D represents the current values in shared variables and registers and buf : −→<sup>t</sup> <sup>→</sup> ( −→<sup>x</sup> × D)<sup>∗</sup> represents the contents of the buffers.

There is a special initial state s<sup>0</sup> = (pc0, mem0, buf0). At the beginning, each thread t<sup>i</sup> points to its initial label l 0 <sup>i</sup> i.e., pc0(ti) = l 0 <sup>i</sup> . We assume that there is a special default value 0 ∈ D. All the shared variables and registers are initiated as 0 i.e., mem0(x) = 0 for all x ∈ - <sup>t</sup>*i*∈→−<sup>t</sup> −→r<sup>i</sup> <sup>∪</sup> −→<sup>x</sup> . Lastly, all the buffers are initially empty i.e., buf0(ti) = for all <sup>t</sup><sup>i</sup> <sup>∈</sup> −→<sup>t</sup> .

The transition relation →TSO between program states is defined in Fig. 5. Transitions are labelled by actions. Each action is an element from −→<sup>t</sup> <sup>×</sup>({τ, isu}∪ ({com, rd}×−→<sup>x</sup> ×D)). Actions keep the information about the thread performing the transition and the actual parameters of the reads and the writes to shared variables. We are only interested in accesses to shared variables, therefore, other transitions are labelled with τ as thread local actions.

A TSO execution of a program P is a sequence of actions π = π1, π2,...,π<sup>n</sup> such that there exists a sequence of states σ = σ0, σ1,...,σn, σ<sup>0</sup> = s<sup>0</sup> is the initial state of P and σ<sup>i</sup>−<sup>1</sup> <sup>π</sup>*<sup>i</sup>* −→ <sup>σ</sup><sup>i</sup> is a valid transition for any <sup>i</sup> ∈ {1,...,n}. We assume that buffers are empty at the end of the execution.

**SC Semantics.** Under SC, a program state is a pair of the form (pc, mem) where pc and mem are defined as above. Shared variables are read directly from the memory mem and every write updates directly the memory mem. To make the relationship between SC and TSO executions more obvious, every write instruction generates isu and com actions which follow one another in the execution (each isu is immediately followed by the corresponding com). Since there are no write buffers, fence instructions have no effect under SC.

**Traces and TSO Robustness.** Consider a (TSO or SC) execution π of P. The trace of π is a graph, denoted by T r(π): Nodes of T r(π) are actions of π except the τ actions. In addition, isu and com actions are unified in a single node. The isu action that puts an element into the buffer and the corresponding com action that drains that element from the buffer correspond to the same node in the trace. Edges of T r(π) represent the happens before order (hb) between these actions. The hb is union of four relations. The program order po keeps the order of actions performed by the same thread excluding the com actions. The store order so keeps the order of com actions on the same variable that write different values<sup>1</sup>. The read-from relation, denoted by rf, relates a com action to a rd action that reads its value. Lastly, the from-reads relation fr relates a rd action to a com action that overwrites the value read by rd; it is defined as the composition of rf and so.

We say that the program P is TSO robust if for any TSO execution π of P, there exists an SC execution π such that T r(π) = T r(π ). It has been proven that robustness implies that the program reaches the same valuations of the shared memory under both TSO and SC [7].

# **4 A Reduction Theory for Checking Robustness**

We present a methodology for checking robustness which builds on concepts introduced in Lipton's reduction theory [17]. This theory allows to rewrite a

<sup>1</sup> Our definition of store order deviates slightly from the standard definition which relates any two writes writing on the same variable, independently of values. The notion of TSO trace robustness induced by this change is slightly weaker than the original definition, but still implies preservation of any safety property from the SC semantics to the TSO semantics. The results concerning TSO robustness used in this paper (Lemma 1) are also not affected by this change. See [8] for more details.

given concurrent program (running under SC) into an equivalent one that has larger atomic blocks. Proving robustness is similar in spirit in the sense that one has to prove that issue and commit actions can happen together atomically. However, differently from the original theory, these actions do not correspond to different statements in the program (they are generated by the same write instruction). Nevertheless, we show that the concepts of left/right movers can be also used to prove robustness.

**Movers.** Let π = π1,...,π<sup>n</sup> be an SC execution. We say that the action π<sup>i</sup> *moves right (resp., left)* in π if the sequence π1,...,π<sup>i</sup>−1, πi+1, πi, πi+2,...,π<sup>n</sup> (resp., π1,...,π<sup>i</sup>−2, πi, π<sup>i</sup>−1, πi+1 ...,πn) is also a valid execution of P, the thread of π<sup>i</sup> is different than the thread of πi+1 (resp., π<sup>i</sup>−<sup>1</sup>), and both executions reach to the same end state σn. Since every issue action is followed immediately by the corresponding commit action, an issue action moves right, resp., left, when the commit action also moves right, resp., left, and vice-versa.

Let instOf<sup>π</sup> be a function, depending on an execution <sup>π</sup>, which given an action π<sup>i</sup> ∈ π, gives the labelled instruction that generated πi. Then, a labelled instruction is a *right (resp., left) mover* if for all SC executions π of P and for all actions <sup>π</sup><sup>i</sup> of <sup>π</sup> such that instOf(πi) = , <sup>π</sup><sup>i</sup> moves right (resp., left) in <sup>π</sup>.

A labelled instruction is a *non-mover* if it is neither left nor right mover, and it is a *both mover* if it is both left and right mover.

**Reachability Between Instructions.** An instruction is *reachable from* the instruction if and both belong to the same thread and there exists an SC execution <sup>π</sup> and indices 1 <sup>≤</sup> i<j ≤ |π<sup>|</sup> such that instOf<sup>π</sup>(πi) = and instOf<sup>π</sup>(π<sup>j</sup> ) = . We say that is reachable from *before a fence* if π<sup>k</sup> is not an action generated by a fence instruction in the same thread as , for all i<k<j. When is a write instruction and a read instruction, we say that is *bufferfree* reachable from if π<sup>k</sup> is not an action generated by a fence instruction in the same thread as or a write action on the same variable that reads-from, for all i<k<j.

**Definition 1.** *We say that a write instruction* <sup>w</sup> *is* atomic *if it is a left mover or every read instruction* <sup>r</sup> *buffer-free reachable from* <sup>w</sup> *is a right mover. We say that* P *is* write atomic *if every write instruction* <sup>w</sup> *in* P *is atomic.*

Note that all of the notions used to define write atomicity (movers and instruction reachability) are based on SC executions of the programs. The following result shows that write atomicity implies robustness.

# **Theorem 1 (Soundness).** *If* P *is write atomic, then it is robust.*

We will prove the contrapositive of the statement. For the proof, we need the notion of minimal violation defined in [7]. A minimal violation is a TSO execution in which the sum of the number of same thread actions between isu and corresponding com actions for all writes is minimal. A minimal violation is of the form π = π1,(t, isu), π2,(t, rd, y, ∗), π3,(t, com, x, ∗), π<sup>4</sup> such that π<sup>1</sup> is an SC execution, only t can delay com actions, the first delayed action is the (t, com, x, ∗) action after π<sup>3</sup> and it corresponds to (t, isu) after π1, π<sup>2</sup> does not contain any com or fence actions by t (writes of t are delayed until after (t, rd, y, ∗)), (t, rd, y, ∗) →hb<sup>+</sup> act for all act ∈ π<sup>3</sup> ◦ {(t, com, x, ∗)} (isu and com actions of other threads are counted as one action for this case), π<sup>3</sup> doesn't contain any action of t, π<sup>4</sup> contains only and all of the com actions of t that are delayed in (t, isu) ◦ π<sup>2</sup> and no com action in (t, com, x, ∗) ◦ π<sup>4</sup> touches y.

Minimal violations are important for us because of the following property:

**Lemma 1 (Completeness of Minimal Violations** [7]**).** *The program* P *is robust iff it does not have a minimal violation.*

Before going into the proof of Theorem 1, we define some notations. Let π be a sequence representing an execution or a fragment of it. Let Q be a set of thread identifiers. Then, π|<sup>Q</sup> is the projection of π on actions from the threads in Q. Similarly, π|<sup>n</sup> is the projection of π on first n elements for some number n. sz(π) gives the length of the sequence π. We also define a product operator ⊗. Let π and ρ be some execution fragments. Then, π ⊗ ρ is same as π except that if the i th isu action of π is not immediately followed by a com action by the same thread, then i th com action of ρ is inserted after this isu. The product operator helps us to fill unfinished writes in one execution fragment by inserting commit actions from another fragment immediately after the issue actions.

*Proof (Theorem* 1*).* Assume P is not robust. Then, there exists a minimal violation π = π1, α, π2, θ, π3, β,π<sup>4</sup> satisfying the conditions described before, where α = (t, isu), θ = (t, rd, y, ∗) and β = (t, com, x, ∗). Below, we show that the write instruction <sup>w</sup> <sup>=</sup> instOf(α) is not atomic.

	- 1.1. <sup>ρ</sup> <sup>=</sup> <sup>π</sup>1, π2|→−<sup>t</sup> \{t}, π3|→−<sup>t</sup> \{t}|sz(π3|→−*<sup>t</sup>* \{*t*})−1, γ,(α, β) is an SC execution of <sup>P</sup> where γ is the last action of π3. γ is a read or write action on x performed by a thread t other than t and value of γ is different from what is written by β.
		- 1.1.1. ρ is an SC execution because t never changes value of a shared variable in π<sup>2</sup> and π3. So, even we remove actions of t in those parts, actions of other threads are still enabled. Since other threads perform only SC operations in <sup>π</sup>, <sup>π</sup>1, π2|→−<sup>t</sup> \{t}, π3|→−<sup>t</sup> \{t} is an SC execution. From π, we also know that the first enabled action of t is α if we delay the actions of t in π<sup>2</sup> and π3.
		- 1.1.2. The last action of π<sup>3</sup> is γ. By definition of a minimal violation, we know that θ →hb<sup>+</sup> α and π<sup>3</sup> does not contain any action of t. So, there must exist an action γ ∈ π<sup>3</sup> such that either γ reads from x and γ →f r β in π or γ writes to x and γ →st β in π. Moreover, γ is the last action of π<sup>3</sup> because if there are other actions after γ, we can delete them and can obtain another minimal violation which is shorter than π and hence contradict the minimality of π.
	- 1.2. <sup>ρ</sup> <sup>=</sup> <sup>π</sup>1, π2|→−<sup>t</sup> \{t}, π3|→−<sup>t</sup> \{t}|sz(π3|→−*<sup>t</sup>* \{*t*})−<sup>1</sup>,(α, β), γ is an SC execution with a different end state than ρ defined in 1.1 has or it is not an SC execution, where instOf(γ ) = instOf(γ).
	- 2.1. We first evaluate the negation of above condition. Assume that for all actions γ and γ such that γ occurs before γ in π2, either γ = (t, rd, z, vz) or γ = (t , isu)(t , com, z, v <sup>z</sup>). Then, <sup>r</sup> <sup>=</sup> instOf(θ) is not a right mover and it is buffer-free reachable from w.
		- 2.1.1. <sup>ρ</sup> <sup>=</sup> <sup>π</sup>1, π2|→−<sup>t</sup> \{t}, π2|{t} <sup>⊗</sup> <sup>π</sup>4, θ, θ is a valid SC execution of <sup>P</sup> where θ = (t , isu)(t , com, y, ∗) for some t = t .
			- 2.1.1.1. <sup>ρ</sup> is an SC execution. <sup>π</sup>1, π2|→−<sup>t</sup> \{t} is a valid SC execution since <sup>t</sup> does not update value of a shared variable in π2. Moreover, all of the actions of t become enabled after this sequence since t never reads value of a variable updated by another thread in π2. Lastly, the first action of π<sup>3</sup> is enabled after this sequence.
			- 2.1.1.2. The first action of π<sup>3</sup> is θ = (t , isu)(t , com, y, ∗). Let θ be the first action of π3. Since θ →hb θ in π and θ is not an action of t by definition of minimal violation, the only case we have is θ →f r θ . Hence, θ is a write action on y that writes a different value than θ reads.
			- 2.1.1.3. r is buffer-free reachable from w. ρ is a SC execution, first action of <sup>ρ</sup> after <sup>π</sup>1, π2|→−<sup>t</sup> \{t} is α, β; <sup>w</sup> <sup>=</sup> instOf((α, β)), <sup>r</sup> <sup>=</sup> instOf(θ) and actions of t in ρ between α, β and θ are not instances of a fence instruction or write to y.
		- 2.1.2. <sup>ρ</sup> <sup>=</sup> <sup>π</sup>1, π2|→−<sup>t</sup> \{t}, π2|{t} <sup>⊗</sup> <sup>π</sup>4, θ , θ is not a valid SC execution.
			- 2.1.2.1. In the last state of ρ, the value of y seen by t is the value read in θ. It is different than the value written by θ . However, at the last state of ρ , the value of y t sees must be the value θ writes. Hence, ρ is not a valid SC execution.
	- 2.2. Assume that there exists γ = (t, rd, z, vz) and γ = (t , isu)(t , com, z, v z) in <sup>π</sup>2. Then, <sup>r</sup> <sup>=</sup> instOf(γ) is not a right mover and <sup>r</sup> is buffer-free reachable from w.
		- 2.2.1. Let i be the index of γ and j be the index of γ in π2. Then, define <sup>ρ</sup> <sup>=</sup> <sup>π</sup>1, π2|<sup>j</sup>−<sup>1</sup>|→−<sup>t</sup> \{t}, π2|i|{t} <sup>⊗</sup> <sup>π</sup>4, γ . ρ is an SC execution of P.
			- 2.2.1.1. <sup>ρ</sup> is an SC execution. <sup>π</sup>1, π2|<sup>j</sup>−<sup>1</sup>|→−<sup>t</sup> \{t} prefix is a valid SC execution because t does not update any shared variable in π2. Moreover, all of the actions of t in π2|i|{t} ⊗ π<sup>4</sup> become enabled after this sequence since t never reads a value of a variable updated by

another thread in π<sup>2</sup> and γ is the next enabled in π<sup>2</sup> after this sequence since it is a write action.

	- 2.2.2.1. In the last state of ρ, value of z seen by t is vz. It is different than the v <sup>z</sup>, value written by γ . However, in the last state of ρ , the value of z t sees must be v <sup>z</sup>. Hence, ρ is not a valid SC execution.

# **5 Abstractions and Verifying Non-robust Programs**

In this section, we introduce program abstractions which are useful for verifying non-robust TSO programs (or even robust programs – see an example at the end of this section). In general, a program P abstracts another program P for some semantic model <sup>M</sup> ∈ {SC, TSO} if every shared variable valuation <sup>σ</sup> reachable from the initial state in an <sup>M</sup> execution of <sup>P</sup> is also reachable in an <sup>M</sup> execution of P . We denote this abstraction relation as P <sup>M</sup> P .

In particular, we are interested in *read instruction abstractions*, which replace instructions that read from a shared variable with more "liberal" read instructions that can read more values (this way, the program may reach more shared variable valuations). We extend the program syntax in Sect. 3 with havoc instructions of the form havoc(reg,varbexpr), where varbexpr is a boolean expression over a set of registers and a single shared variable var. The meaning of this instruction is that the register reg is assigned with any value that satisfies varbexpr (where the other registers and the variable var are interpreted with their current values). The program abstraction we consider will replace read instructions of the form reg := var with havoc instructions havoc(reg,varbexpr).

While replacing read instructions with havoc instructions, we must guarantee that the new program reaches at least the same set of shared variable valuations after executing the havoc as the original program after the read. Hence, we allow such a rewriting only when the boolean expression varbexpr is weaker (in a logical sense) than the equality reg = var (hence, there exists an execution of the havoc instruction where reg = var).

**Lemma 2.** *Let* P *be a program and* P *be obtained from* P *by replacing an instruction* l<sup>1</sup> : x := r; *goto* l<sup>2</sup> *of a thread* t *with* l<sup>1</sup> : *havoc*(r, φ(x, −→r )); *goto* l<sup>2</sup> *such that* <sup>∀</sup>x, r. x <sup>=</sup> <sup>r</sup> <sup>=</sup><sup>⇒</sup> <sup>φ</sup>(x, −→<sup>r</sup> ) *is valid. Then,* <sup>P</sup>SC <sup>P</sup> *and* <sup>P</sup>TSO <sup>P</sup> *.*

The notion of trace extends to programs that contain havoc instructions as follows. Assume that (t, hvc, x, φ(x)) is the action generated by an instruction havoc(r, φ(x, −→r )), where x is a shared variable and −→r a set of registers (the

$$\begin{array}{lcl} \textbf{procedure } foo() \{ \\ \mathbf{x} := 1; \\ \mathbf{r2} := \mathbf{y}; \\ \} \end{array} \qquad \begin{array}{lcl} \textbf{procedure } \textbf{bar} \ () \{ \\ \mathbf{do} \{ \\ \mathbf{r1} = \mathbf{x}; \\ \quad //! \textit{hav} \times \{ r1 \ , \langle x \neq 0 \rangle ? r1 = x \lor r1 = 0 : r1 = 0 \} \\ \qquad \qquad \begin{array}{l} \langle \rangle \textbf{while} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{if} \ (\textbf{x} \ \textbf{$$

**Fig. 6.** An example program that needs a read abstraction to pass our robustness checks. The havoc statement in comments reads as follows: if value of *x* is not 0 then *r*1 gets either the value of *x* or 0. Otherwise, it is 0.

action stores the constraint φ where the values of the registers are instantiated with their current values – the shared variable x is the only free variable in φ(x)). Roughly, the hvc actions are special cases of rd actions. Consider an execution π where an action α = (t, hvc, x, φ(x)) is generated by reading the value of a write action β = (com, x, v) (i.e., the value v was the current value of x when the havoc instruction was executed). Then, the trace of π contains a read-from edge β →rf α as for regular read actions. However, fr edges are created differently. If α was a rd action we would say that we have α →f r γ if β →rf α and β →st γ. For the havoc case, the situation is a little bit different. Let γ = (com, x, v ) be an action. We have α →f r γ if and only if either β →rf α, β →st γ and φ(v ) is false or α →f r γ and γ →st γ where γ is an action. Intuitively, there is a from-read dependency from an havoc action to a commit action, only when the commit action invalidates the constraint φ(x) of the havoc (or if it follows such a commit in store order).

The notion of write-atomicity (Definition 1) extends to programs with havoc instructions by interpreting havoc instructions havoc(r, φ(x, −→r )) as regular read instructions r := x. Theorem 1 which states that write-atomicity implies robustness can also be easily extended to this case.

Read abstractions are useful in two ways. First, they allow us to prove properties of non-robust program as the work stealing queue example in Fig. 3. We can apply appropriate read abstractions to relax the original program so that it becomes robust in the end. Then, we can use SC reasoning tools on the robust program to prove invariants of the program.

Second, read abstractions could be helpful for proving robustness directly. The method based on write-atomicity we propose for verifying robustness is sound but not complete. Some incompleteness scenarios can be avoided using read abstractions. If we can abstract read instructions such that the new program reaches exactly the same states (in terms of shared variables) as the original one, it may help to avoid executions that violate mover checks.

Consider the program in Fig. 6. The write statement x := 1 in procedure foo is not atomic. It is not a left mover due to the read of x in the do-while loop of bar. Moreover, the later read from y is buffer-free reachable from this write and it is not a right mover because of the write to y in bar. To make it atomic, we apply read abstraction to the read instruction of bar that reads from x. In the new relaxed read, r1 can read 0 along with the value of x when x is not zero as shown in the comments below the instruction. With this abstraction, the write to x becomes a left mover because reads from x after the write can now read the old value which was 0. Thus, the program becomes write-atomic. If we think of TSO traces of the abstract program and replace hvc nodes with rd nodes, we get exactly the TSO traces of the original program. However, the abstraction adds more SC traces to the program and the program becomes robust.

# **6 Experimental Evaluation**

To test the practical value of our method, we have considered the benchmark for checking TSO robustness described in [2], which consists of 34 programs. This benchmark is quite exhaustive, it includes examples introduced in previous works on this subject. Many of the programs in this benchmark are easy to prove being write-atomic. Every write is followed by no buffer-free read instruction which makes them trivially atomic (like the message passing program in Fig. 1). This holds for 20 out of the 34 programs. Out of the remaining programs, 13 required mover checks and 4 required read abstractions to show robustness (our method didn't succeed on one of the programs in the benchmark, explained at the end of this section). Except Chase-Lev, the initial versions of all the 12 examples are trace robust<sup>2</sup>. Besides Chase-Lev, read-abstractions are equivalent to the original programs in terms of reachable shared variable configurations. Detailed information for these examples can be found in Table 1.

To check whether writes/reads are left/right movers and the soundness of abstractions, we have used the tool Civl [13]. This tool allows to prove assertions about concurrent programs (Owicki-Gries annotations) and also to check whether an instruction is a left/right mover. The buffer-free read instructions reachable from a write before a fence were obtained using a trivial analysis of the control-flow graph (CFG) of the program. This method is a sound approximation of the definition in Sect. 4 but it was sufficient for all the examples.

Our method was not precise enough to prove robustness for only one example, named as nbw-w-lr-rl in [7]. This program contains a method with explicit calls to the lock and unlock methods of a spinlock. The instruction that writes to the lock variable inside the unlock method is not atomic, because of the reads from the lock variable and the calls to the getAndSet primitive inside the lock method. Abstracting the reads from the lock variable is not sufficient in this case due to the conflicts with getAndSet actions. However, we believe that read abstractions could be extended to getAndSet instructions (which both read and write to a shared variable atomically) in order to deal with this example.

<sup>2</sup> If we consider the standard notion of *so* (that relates any two writes on the same variable independent of their values), all examples except MCSLock and dc-locking become non trace robust.

**Table 1.** Benchmark results. The second column (RB) stands for the robustness status of the original program according to our extended *hb* definition. RA column shows the number of read abstractions performed. RM column represents the number of read instructions that are checked to be right movers and the LM column represents the write instructions that are shown to be left movers. PO shows the total number of proof obligations generated and VT stands for the total verification time in seconds.


### **7 Related Work**

The weakest correctness criterion that enables SC reasoning for proving invariants of programs running under TSO is *state-robustness* i.e., the reachable set of states is the same under both SC and TSO. However, this problem has high complexity (non-primitive recursive for programs with a finite number of threads and a finite data domain [6]). Therefore, it is difficult to come up with an efficient and precise solution. A symbolic decision procedure is presented in [1] and over-approximate analyses are proposed in [14,15].

Due to the high complexity of state-robustness, stronger correctness criteria with lower complexity have been proposed. Trace-robustness (that we call simply robustness in our paper) is one of the most studied criteria in the literature. Bouajjani et al. [9] have proved that deciding trace-robustness is PSpacecomplete, resp., EXPSpace-complete, for a finite, resp., unbounded, number of threads and a finite data domain.

There are various tools for checking trace-robustness. Trencher [7] applies to bounded-thread programs with finite data. In theory, the approach in Trencher can be applied to infinite-state programs, but implementing it is not obvious because it requires solving non-trivial reachability queries in such programs. In comparison, our approach (and our implementation based on Civl) applies to infinite state programs. All our examples consider infinite data domains, while Chase-Lev, FIFO-iWSQ, LIFO-iWSQ, Anchor-iWSQ, MCSLock, dc-locking and inline pgsql have an unbounded number of threads. Musketeer [4] provides an approximate solution by checking existence of critical cycles on the control-flow graph. While Musketeer can deal with infinite data (since data is abstracted away), it is restricted to bounded-thread programs. Thus, it cannot deal with the unbounded thread examples mentioned above. Furthermore, Musketeer cannot prove robust even some examples with finitely many threads, e.g., nbw w wr, write+r, r+detours, sb+detours+coh. Other tools for approximate robustness checking, to which we compare in similar ways, have been proposed in [5,10,11].

Besides trace-robustness, there are other correctness criteria like triangular race freedom (Trf) and persistence that are stronger than state-robustness. Persistence [2] is incomparable to trace-robustness, and Trf [18] is stronger than both trace-robustness and persistence. Our method can verify examples that are state-robust but neither persistent nor Trf.

Reduction and abstraction techniques were used for reasoning on SC programs. Qed [12] is a tool that supports statement transformations as a way of abstracting programs combined with a mover analysis. Also, Civl [13] allows proving location assertions in the context of the Owicki-Gries logic which is enhanced with Lipton's reduction theory [17]. Our work enables the use of such tools for reasoning about the TSO semantics of a program.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Quasi-Optimal Partial Order Reduction**

Huyen T. T. Nguyen<sup>1</sup> , C´esar Rodr´ıguez1,3, Marcelo Sousa<sup>2</sup>, Camille Coti<sup>1</sup> , and Laure Petrucci1(B)

> <sup>1</sup> LIPN, CNRS UMR 7030, Universit´e Paris 13, Sorbonne Paris Cit´e, Villetaneuse, France {tnguyen,cesar.rodriguez}@lipn.fr, {camille.coti,laure.petrucci}@lipn.univ-paris13.fr <sup>2</sup> University of Oxford, Oxford, UK marcelo.sousa@cs.ox.ac.uk <sup>3</sup> Diffblue Ltd., Oxford, UK

**Abstract.** A dynamic partial order reduction (DPOR) algorithm is optimal when it always explores at most one representative per Mazurkiewicz trace. Existing literature suggests that the reduction obtained by the non-optimal, state-of-the-art Source-DPOR (SDPOR) algorithm is comparable to optimal DPOR. We show the first program with <sup>O</sup>(*n*) Mazurkiewicz traces where SDPOR explores <sup>O</sup>(2*<sup>n</sup>*) redundant schedules (as this paper was under review, we were made aware of the recent publication of another paper [3] which contains an independentlydiscovered example program with the same characteristics). We furthermore identify the cause of this blow-up as an NP-hard problem. Our main contribution is a new approach, called Quasi-Optimal POR, that can arbitrarily approximate an optimal exploration using a provided constant *k*. We present an implementation of our method in a new tool called Dpu using specialised data structures. Experiments with Dpu, including Debian packages, show that optimality is achieved with low values of *k*, outperforming state-of-the-art tools.

### **1 Introduction**

Dynamic partial-order reduction (DPOR) [1,10,19] is a mature approach to mitigate the state explosion problem in stateless model checking of multithreaded programs. DPORs are based on Mazurkiewicz trace theory [13], a trueconcurrency semantics where the set of executions of the program is partitioned into equivalence classes known as Mazurkiewicz traces (M-traces). In a DPOR, this partitioning is defined by an independence relation over concurrent actions that is computed dynamically and the method explores executions which are representatives of M-traces. The exploration is *sound* when it explores all M-traces, and it is considered *optimal* [1] when it explores each M-trace only once.

Since two independent actions might have to be explored from the same state in order to explore all M-traces, a DPOR algorithm uses independence to compute a provably-sufficient subset of the enabled transitions to explore for each state encountered. Typically this involves the combination of forward reasoning

**Fig. 1.** (a) Programs; (b) partially-ordered executions;

(persistent sets [11] or source sets [1,4]) with backward reasoning (sleep sets [11]) to obtain a more efficient exploration. However, in order to obtain optimality, a DPOR needs to compute sequences of transitions (as opposed to sets of enabled transitions) that avoid visiting a previously visited M-trace. These sequences are stored in a data structure called *wakeup trees* in [1] and known as *alternatives* in [19]. Computing these sequences thus amounts to deciding whether the DPOR needs to visit yet another M-trace (or all have already been seen).

In this paper, we prove that computing alternatives in an optimal DPOR is an NP-complete problem. To the best our knowledge this is the first formal complexity result on this important subproblem that optimal and non-optimal DPORs need to solve. The program shown in Fig. 1(a) illustrates a practical consequence of this result: the non-optimal, state-of-the-art SDPOR algorithm [1] can explore here <sup>O</sup>(2*<sup>n</sup>*) interleavings but the program has only <sup>O</sup>(n) M-traces.

The program contains n := 3 *writer* threads w0, w1, w2, each writing to a different variable. The thread *count* increments n − 1 times a zero-initialized counter c. Thread *master* reads c into variable i and writes to x*i*.

The statements x<sup>0</sup> = 7 and x<sup>1</sup> = 8 are independent because they produce the same state regardless of their execution order. Statements i = c and any statement in the *count* thread are dependent or *interfering*: their execution orders result in different states. Similarly, x*<sup>i</sup>* = 0 interferes with exactly one *writer* thread, depending on the value of i.

Using this independence relation, the set of executions of this program can be partitioned into six M-traces, corresponding to the six partial orders shown in Fig. 1(b). Thus, an optimal DPOR explores six executions (2n-executions for n *writers*). We now show why SDPOR explores <sup>O</sup>(2*<sup>n</sup>*) in the general case. Conceptually, SDPOR is a loop that (1) runs the program, (2) identifies two dependent statements that can be swapped, and (3) reverses them and re-executes the program. It terminates when no more dependent statements can be swapped.

Consider the interference on the counter variable c between the *master* and the *count* thread. Their execution order determines which *writer* thread interferes with the *master* statement x*<sup>i</sup>* = 0. If c = 1 is executed just before i = c, then x*<sup>i</sup>* = 0 interferes with w1. However, if i = c is executed before, then x*<sup>i</sup>* = 0 interferes with w0. Since SDPOR does not track relations between dependent statements, it will naively try to reverse the race between x*<sup>i</sup>* = 0 and *all writer threads*, which results in exploring <sup>O</sup>(2*<sup>n</sup>*) executions. In this program, exploring only six traces requires understanding the entanglement between both interferences as the order in which the first is reversed determines the second.

As a trade-off solution between solving this NP-complete problem and potentially explore an exponential number of redundant schedules, we propose a hybrid approach called Quasi-Optimal POR (QPOR) which can turn a non-optimal DPOR into an optimal one. In particular, we provide a polynomial algorithm to compute alternative executions that can arbitrarily approximate the optimal solution based on a user specified constant k. The key concept is a new notion of k-*partial alternative*, which can intuitively be seen as a "good enough" alternative: they revert two interfering statements while remembering the resolution of the last k − 1 interferences.

The major differences between QPOR and the DPORs of [1] are that: (1) QPOR is based on prime event structures [17], a partial-order semantics that has been recently applied to programs [19,21], instead of a sequential view to thread interleaving, and (2) it computes <sup>k</sup>-partial alternatives with an <sup>O</sup>(n*<sup>k</sup>*) algorithm while optimal DPOR corresponds to computing ∞-partial alternatives with an <sup>O</sup>(2*<sup>n</sup>*) algorithm. For the program shown in Fig. 1(a), QPOR achieves optimality with k = 2 because races are coupled with (at most) another race. As expected, the cost of computing k-partial alternatives and the reductions obtained by the method increase with higher values of k.

Finding k-partial alternatives requires decision procedures for traversing the causality and conflict relations in event structures. Our main algorithmic contribution is to represent these relations as a set of trees where events are encoded as one or two nodes in two different trees. We show that checking causality/conflict between events amounts to an efficient traversal in one of these trees.

In summary, our main contributions are:


Furthermore, in Sect. 6 we show that: (1) low values of k often achieve optimality; (2) even with non-optimal explorations Dpu greatly outperforms Nidhugg; (3) Dpu copes with production code in Debian packages and achieves much higher state space coverage and efficiency than Maple.

Proofs for all our formal results are available in the unabridged version [15].

### **2 Preliminaries**

In this section we provide the formal background used throughout the paper.

*Concurrent Programs.* We consider deterministic concurrent programs composed of a fixed number of threads that communicate via shared memory and synchronize using mutexes (Fig. 1(a) can be trivially modified to satisfy this). We also assume that local statements can only modify shared memory within a mutex block. Therefore, it suffices to only consider races of mutex accesses.

Formally, a *concurrent program* is a structure P := M,L,T,m0, l0, where M is the set of *memory states* (valuations of program variables, including instruction pointers), L is the set of *mutexes*, m<sup>0</sup> is the *initial memory state*, l<sup>0</sup> is the *initial mutexes state* and T is the set of *thread statements*. A thread statement <sup>t</sup> := i, f is a pair where <sup>i</sup> <sup>∈</sup> <sup>N</sup> is the *thread identifier* associated with the statement and f : M → (M×Λ) is a *partial* function that models the transformation of the memory as well as the *effect* Λ := {loc}∪({acq, rel}×L) of the statement with respect to thread synchronization. Statements of loc effect model local thread code. Statements associated with acq, x or rel, x model lock and unlock operations on a mutex x. Finally, we assume that (1) functions f are PTIME-decidable; (2) acq/rel statements do not modify the memory; and (3) loc statements modify thread-shared memory only within lock/unlock blocks. When (3) is violated, then P has a *datarace* (undefined behavior in almost all languages), and our technique can be used to find such statements, see Sect. 6.

We use *labelled transition systems* (LT S) semantics for our programs. We associate a program P with the LT S M*<sup>P</sup>* := S,→, A, s0. The set S := M × (L→{0, 1}) are the *states* of M*<sup>P</sup>* , i.e., pairs of the form m, v where m is the state of the memory and v indicates when a mutex is locked (1) or unlocked (0). The *actions* in <sup>A</sup> <sup>⊆</sup> <sup>N</sup> <sup>×</sup> <sup>Λ</sup> are pairs i, b where <sup>i</sup> is the identifier of the thread that executes some statement and b is the effect of the statement. We use the function <sup>p</sup>: <sup>A</sup> <sup>→</sup> <sup>N</sup> to retrieve the thread identifier. The *transition relation* → ⊆ S × <sup>A</sup> × S contains a triple m, v *i,b* −−−→ m , v exactly when there is some thread statement i, f ∈ T such that f(m) = m , b and either (1) b = loc and v = v, or (2) b = acq, x and v(x) = 0 and v = v|*x*→<sup>1</sup>, or (3) b = rel, x and v = v|*x*→<sup>0</sup>. Notation f*<sup>x</sup>*→*<sup>y</sup>* denotes a function that behaves like f for all inputs except for x, where f(x) = y. The *initial state* is s<sup>0</sup> := m0, l0.

Furthermore, if s *<sup>a</sup>* −→ <sup>s</sup> is a transition, the action <sup>a</sup> is *enabled* at <sup>s</sup>. Let *enabl*(s) denote the set of actions enabled at s. A sequence σ := a<sup>1</sup> ...a*<sup>n</sup>* ∈ A<sup>∗</sup> is a *run* when there are states <sup>s</sup>1,...,s*<sup>n</sup>* satisfying <sup>s</sup><sup>0</sup> *<sup>a</sup>*<sup>1</sup> −→ <sup>s</sup><sup>1</sup> ... *<sup>a</sup><sup>n</sup>* −−→ <sup>s</sup>*n*. We define *state*(σ) := s*n*. We let *runs*(M*<sup>P</sup>* ) denote the set of all runs and *reach*(M*<sup>P</sup>* ) := {*state*(σ) ∈ S : σ ∈ *runs*(M*<sup>P</sup>* )} the set of all *reachable states*.

*Independence.* Dynamic partial-order reduction methods use a notion called independence to avoid exploring concurrent interleavings that lead to the same state. We recall the standard notion of independence for actions in [11]. Two actions a, a ∈ A *commute at* a state s ∈ S iff

– if <sup>a</sup> <sup>∈</sup> *enabl*(s) and <sup>s</sup> *<sup>a</sup>* −→ <sup>s</sup> , then a ∈ *enabl*(s) iff a ∈ *enabl*(s ); and

– if a, a <sup>∈</sup> *enabl*(s), then there is a state <sup>s</sup> such that <sup>s</sup> *a.a*- −−→ <sup>s</sup> and <sup>s</sup> *<sup>a</sup>*- *.a* −−→ <sup>s</sup> . Independence between actions is an under-approximation of commutativity. A binary relation ♦ ⊆ A × A is an *independence* on M*<sup>P</sup>* if it is symmetric, irreflexive, and every pair a, a in ♦ commutes at every state in *reach*(M*<sup>P</sup>* ).

In general M*<sup>P</sup>* has multiple independence relations, clearly ∅ is always one of them. We define relation ♦*<sup>P</sup>* ⊆ A × A as the smallest irreflexive, symmetric relation where i, b ♦*<sup>P</sup>* i , b holds if i = i and either b = loc or b = acq x and b ∈ {acq x, rel x}. By construction ♦*<sup>P</sup>* is always an independence.

*Labelled Prime Event Structures. Prime event structures* (pes) are well-known non-interleaving, partial-order semantics [7,8,16]. Let X be a set of actions. <sup>A</sup> pes *over* <sup>X</sup> is a structure <sup>E</sup> := E, <, #, h where <sup>E</sup> is a set of *events*, <sup>&</sup>lt; <sup>⊆</sup> E×E is a strict partial order called *causality relation*, # ⊆ E×E is a symmetric, irreflexive *conflict relation*, and h: E → X is a labelling function. Causality represents the happens-before relation between events, and conflict between two events expresses that any execution includes at most one of them. Figure 2(b) shows a pes over <sup>N</sup>×<sup>Λ</sup> where causality is depicted by arrows, conflicts by dotted lines, and the labelling h is shown next to the events, e.g., 1 < 5, 8 < 12, 2 # 8, and h(1) = 0, loc. The *history* of an event e, e := {e ∈ E : e < e}, is the least set of events that need to happen before e.

The notion of concurrent execution in a pes is captured by the concept of *configuration*. A configuration is a (partially ordered) execution of the system, i.e., a set C ⊆ E of events that is *causally closed* (if e ∈ C, then e ⊆ C) and *conflict-free* (if e, e ∈ C, then ¬(e # e )). In Fig. 2(b), the set {8, 9, 15} is a configuration, but {3} or {1, 2, 8} are not. We let *conf* (E) denote the set of all configurations of E, and [e] := e∪{e} the *local configuration* of e. In Fig. 2(b), [11] = {1, 8, 9, 10, 11}. A configuration represents a set of *interleavings* over X. An interleaving is a sequence in X<sup>∗</sup> that labels any topological sorting of the events in C. In Fig. 2(b), *inter* ({1, 8}) = {ab, ba} with a := 0, loc and b := 1, acq m.

The *extensions* of C are the events not in C whose histories are included in C: *ex* (C) := {e ∈ E : e /∈ C ∧ e ⊆ C}. The *enabled* events of C are the extensions that can form a larger configuration: *en*(C) := {e ∈ *ex* (C): C ∪ {e} ∈ *conf* (E)}. Finally, the *conflicting extensions* of C are the extensions that are not enabled: *cex* (C) := *ex* (C) \ *en*(C). In Fig. 2(b), *ex* ({1, 8}) = {2, 9, 15}, *en*({1, 8}) = {9, <sup>15</sup>}, and *cex* ({1, <sup>8</sup>}) = {2}. See [20] for more information on pes concepts.

*Parametric Unfolding Semantics.* We recall the program pes semantics of [19,20] (modulo notation differences). For a program P and any independence ♦ on M*<sup>P</sup>* we define a pes <sup>U</sup>*P,*♦ that represents the behavior of <sup>P</sup>, i.e., such that the interleavings of its set of configurations equals *runs*(M*<sup>P</sup>* ).

Each event in U*P,*♦ is defined by a canonical name of the form e := a, H, where a ∈ A is an action of M*<sup>P</sup>* and H is a configuration of U*P,*♦. Intuitively, e represents the action a after the *history* (or the causes) H. Figure 2(b) shows an example. Event 11 is 0, acq m, {1, 8, 9, 10} and event 1 is 0, loc, ∅. Note the inductive nature of the name, and how it allows to uniquely identify each event. We define the *state of a configuration* as the state reached by *any* of its interleavings. Formally, for C ∈ *conf* (U*P,*♦) we define *state*(C) as s<sup>0</sup> if C = ∅

**Fig. 2.** (a) A program *<sup>P</sup>*; (b) its unfolding semantics <sup>U</sup>*P,*♦*<sup>P</sup>* .

and as *state*(σ) for some σ ∈ *inter* (C) if C = ∅. Despite its appearance *state*(C) is well-defined because *all* sequences in *inter* (C) reach the *same* state, see [20] for a proof.

**Definition 1 (Unfolding).** *Given a program* P *and some independence relation* ♦ *on* M*<sup>P</sup>* := S,→, A, s0*, the* unfolding of P under ♦*, denoted* U*P,*♦*, is the* pes *over* <sup>A</sup> *constructed by the following fixpoint rules:*


Step 1 creates an empty pes with only one (empty) configuration. Step 2 inserts a new event a, C by finding a configuration C that enables an action a which is dependent with all causality-maximal events in C. In Fig. 2, this initially creates events 1, 8, and 15. For event 1 := 0, loc, ∅, this is because action 0, loc is enabled at *state*(∅) = s<sup>0</sup> and there is no <-maximal event in ∅ to consider. Similarly, the state of C<sup>1</sup> := {1, 8, 9, 10} enables action a<sup>1</sup> := 0, acq m, and both h(1) and h(10) are dependent with a<sup>1</sup> in ♦*<sup>P</sup>* . As a result a1, C1 is an event (number 11). Furthermore, while a<sup>2</sup> := 0, loc is enabled at *state*(C2), with C<sup>2</sup> := {8, 9, 10}, a<sup>2</sup> is independent of h(10) and a2, C2 is not an event.

After inserting an event e := a, C, Definition 1 declares all events in C causal predecessors of e. For any event e in E but not in [e] such that h(e ) is dependent with a, the order of execution of e and e yields different states. We thus set them in conflict. In Fig. 2, we set 2 # 8 because h(2) is dependent with h(8) and 2 ∈/ [8] and 8 ∈/ [2].

# **3 Unfolding-Based DPOR**

This section presents an algorithm that exhaustively explores all deadlock states of a given program (a *deadlock* is a state where no thread is enabled).


**Algorithm 1.** Unfolding-based POR exploration. See text for definitions.

For the rest of the paper, unless otherwise stated, we let P be a *terminating* program (i.e., *runs*(M*<sup>P</sup>* ) is a finite set of finite sequences) and ♦ an independence on M*<sup>P</sup>* . Consequently, U*P,*♦ has finitely many events and configurations.

Our POR algorithm (Algorithm 1) analyzes P by exploring the configurations of U*P,*♦. It visits all ⊆-maximal configurations of U*P,*♦, which correspond to the deadlock states in *reach*(M*<sup>P</sup>* ), and organizes the exploration as a binary tree.

Explore(*C,D,A*) has a global set U that stores all events of U*P,*♦ discovered so far. The three arguments are: C, the configuration to be explored; D (for *disabled*), a set of events that shall never be visited (included in C) again; and A (for *add*), used to direct the exploration towards a configuration that conflicts with D. A call to Explore(*C,D,A*) visits all maximal configurations of U*P,*♦ which contain C and do not contain D, and the first one explored contains C ∪A.

The algorithm first adds *ex* (C) to U. If C is a maximal configuration (i.e., there is no enabled event) then line 5 returns. If C is not maximal but *en*(C) ⊆ D, then all possible events that could be added to C have already been explored and this call was redundant work. In this case the algorithm also returns and we say that it has explored a *sleep-set blocked* (SSB) execution [1]. Algorithm 1 next selects an event enabled at C, if possible from A (line 7 and 9) and makes a recursive call (left subtree) that explores *all* configurations that contain all events in C ∪ {e} and no event from D. Since that call visits all maximal configurations containing C and e, it remains to visit those containing C but not e. At line 11 we determine if any such configuration exists. Function Alt returns a set of configurations, so-called *clues*. A clue is a witness that a ⊆-maximal configuration exists in U*P,*♦ which contains C and not D ∪ {e}.

**Definition 2 (Clue).** *Let* D *and* U *be sets of events, and* C *a configuration such that* C ∩ D = ∅*. A* clue *to* D *after* C *in* U *is a configuration* J ⊆ U *such that* C ∪ J *is a configuration and* D ∩ J = ∅*.*

**Definition 3 (**Alt **function).** *Function* Alt *denotes* any *function such that* Alt(*B,F*) *returns a set of clues to* F *after* B *in* U*, and the set is non-empty if* U*P,*♦ *has at least one maximal configuration* C *where* B ⊆ C *and* C ∩ F = ∅*.*

When Alt returns a clue J, the clue is passed in the second recursive call (line 12) to "mark the way" (using set A) in the subsequent recursive calls at line 10, and guide the exploration towards the maximal configuration that J witnesses. Definition 3 does not identify a concrete implementation of Alt. It rather indicates how to implement Alt so that Algorithm 1 terminates and is complete (see below). Different PORs in the literature can be reframed in terms of Algorithm 1. SDPOR [1] uses clues that mark the way with only one event ahead (|J \ C| = 1) and can hit SSBs. Optimal DPORs [1,19] use size-varying clues that guide the exploration provably guaranteeing that any SSB will be avoided.

Algorithm 1 is *optimal* when it does not explore a SSB. To make Algorithm 1 optimal Alt needs to return clues that are *alternatives* [19], which satisfy stronger constraints. When that happens, Algorithm 1 is equivalent to the DPOR in [19] and becomes optimal (see [20] for a proof).

**Definition 4 (Alternative** [19]**).** *Let* D *and* U *be sets of events and* C *a configuration such that* C ∩ D = ∅*. An* alternative *to* D *after* C *in* U *is a clue* J *to* D *after* C *in* U *such that* ∀e ∈ D : ∃e ∈ J*,* e # e *.*

Line 13 removes from U events that will not be necessary for Alt to find clues in the future. The events preserved, Q*C,D* := C ∪ D ∪ #(C ∪ D), include all events in C ∪ D as well as every event in U that is in conflict with some event in C ∪ D. The preserved events will suffice to compute alternatives [19], but other non-optimal implementations of Alt could allow for more aggressive pruning.

The ⊆-maximal configurations of Fig. 2(b) are [7] ∪ [17], [14], and [19]. Our algorithm starts at configuration C = ∅. After 10 recursive calls it visits C = [7]∪[17]. Then it backtracks to C = {1}, calls Alt({1}, {2}), which provides, e.g., J = {1, 8}, and visits C = {1, 8} with D = {2}. After 6 more recursive calls it visits C = [14], backtracks to C = [12], calls Alt([12], {2, 13}]), which provides, e.g., J = {15}, and after two more recursive calls it visits C = [12] ∪ {15} with D = {2, 13}. Finally, after 4 more recursive calls it visits C = [19].

Finally, we focus on the correctness of Algorithm 1, and prove termination and soundness of the algorithm:

### **Theorem 1 (Termination).** *Regardless of its input, Algorithm 1 always stops.*

**Theorem 2 (Completeness).** *Let* <sup>C</sup><sup>ˆ</sup> *be a* <sup>⊆</sup>*-maximal configuration of* <sup>U</sup>*P,*♦*. Then Algorithm 1 calls* Explore(*C,D,A*) *at least once with* C = Cˆ*.*

### **4 Complexity**

This section presents complexity results about the only non-trival steps in Algorithm 1: computing *ex* (C) and the call to Alt(·, ·). An implementation of Alt(*B,F*) that systematically returns B would satisfy Definition 3, but would also render Algorithm 1 unusable (equivalent to a DFS in M*<sup>P</sup>* ). On the other hand the algorithm becomes optimal when Alt returns alternatives. Optimality comes at a cost:

**Theorem 3.** *Given a finite* pes <sup>E</sup>*, some configuration* <sup>C</sup> <sup>∈</sup> *conf* (E)*, and a set* D ⊆ *ex* (C)*, deciding if an alternative to* D *after* C *exists in* E *is NP-complete.*

Theorem <sup>3</sup> assumes that <sup>E</sup> is an arbitrary pes. Assuming that <sup>E</sup> is the unfolding of a program P under ♦*<sup>P</sup>* does not reduce this complexity:

**Theorem 4.** *Let* P *be a program and* U *a causally-closed set of events from* U*P,*♦*<sup>P</sup> . For any configuration* C ⊆ U *and any* D ⊆ *ex* (C)*, deciding if an alternative to* D *after* C *exists in* U *is NP-complete.*

These complexity results lead us to consider (in next section) new approaches that avoid the NP-hardness of computing alternatives while still retaining their capacity to prune the search.

Finally, we focus on the complexity of computing *ex* (C), which essentially reduces to computing *cex* (C), as computing *en*(C) is trivial. Assuming that E is given, computing *cex* (C) for some C ∈ *conf* (E) is a linear problem. However, for any realistic implementation of Algorithm 1, E is not available (the very goal of Algorithm 1 is to find all of its events). So a useful complexity result about *cex* (C) necessarily refers to the orignal system under analysis. When E is the unfolding of a Petri net [14], computing *cex* (C) is NP-complete:

**Theorem 5.** *Let* N *be a Petri net,* t *a transition of* N*,* E *the unfolding of* N *and* <sup>C</sup> *a configuration of* <sup>E</sup>*. Deciding if* <sup>h</sup>−<sup>1</sup>(t) <sup>∩</sup> *cex* (C) = <sup>∅</sup> *is NP-complete.*

Fortunately, computing *cex* (C) for programs is a much simpler task. Function cexp(*C* ), shown in Algorithm 1, computes and returns *cex* (C) when E is the unfolding of some program. We explain cexp(*C* ) in detail in Sect. 5.3. But assuming that functions pt and pm can be computed in constant time, and relation < decided in O(log |C|), as we will show, clearly cexp works in time <sup>O</sup>(n<sup>2</sup> log <sup>n</sup>), where <sup>n</sup> := <sup>|</sup>C|, as both loops are bounded by the size of <sup>C</sup>.

# **5 New Algorithm for Computing Alternatives**

This section introduces a new class of clues, called k-partial alternatives. These can arbitrarily reduce the number of redundant explorations (SSBs) performed by Algorithm 1 and can be computed in polynomial time. Specialized data structures and algorithms for k-partial alternatives are also presented.

**Definition 5 (k-partial alternative).** *Let* U *be a set of events,* C ⊆ U *a configuration,* <sup>D</sup> <sup>⊆</sup> <sup>U</sup> *a set of events, and* <sup>k</sup> <sup>∈</sup> <sup>N</sup> *a number. A configuration* <sup>J</sup> *is <sup>a</sup>* <sup>k</sup>*-*partial alternative *to* <sup>D</sup> *after* <sup>C</sup> *if there is some* <sup>D</sup><sup>ˆ</sup> <sup>⊆</sup> <sup>D</sup> *such that* <sup>|</sup>Dˆ<sup>|</sup> <sup>=</sup> <sup>k</sup> *and* J *is an alternative to* Dˆ *after* C*.*

A k-partial alternative needs to conflict with only k (instead of all) events in D. An alternative is thus an ∞-partial alternative. If we reframe SDPOR in terms of Algorithm 1, it becomes an algorithm using *singleton 1-partial* alternatives. While k-partial alternatives are a very simple concept, most of their simplicity stems from the fact that they are expressed within the elegant framework of pes semantics. Defining the same concept on top of sequential semantics (often used in the POR literature [1,2,9–11,23]), would have required much more complex device.

We compute k-partial alternatives using a comb data structure:

**Definition 6 (Comb).** *Let* <sup>A</sup> *be a set. An* A-comb <sup>c</sup> *of size* <sup>n</sup> <sup>∈</sup> <sup>N</sup> *is an ordered collection of* spikes s1,...,s*n, where each spike* s*<sup>i</sup>* ∈ A<sup>∗</sup> *is a sequence of elements over* A*. Furthermore, a* combination *over* c *is any tuple* a1,...,a*n where* a*<sup>i</sup>* ∈ s*<sup>i</sup> is an element of the spike.*

It is possible to compute k-partial alternatives (and by extension optimal alternatives) to D after C in U using a comb, as follows:


Step 3 guarantees that J is a clue. Steps 1 and 2 guarantee that it will conflict with at least k events from D. It is straightforward to prove that the procedure will find a k-partial alternative to D after C in U when an ∞-partial alternative to D after C exists in U. It can thus be used to implement Definition 3.

Steps 2, 3, and 4 require to decide whether a given pair of events is in conflict. Similarly, step 3 requires to decide if two events are causally related. Efficiently computing k-partial alternatives thus reduces to efficiently computing causality and conflict between events.

#### **5.1 Computing Causality and Conflict for PES Events**

In this section we introduce an efficient data structure for deciding whether two events in the unfolding of a program are causally related or in conflict.

As in Sect. 3, let P be a program, M*<sup>P</sup>* its LTS semantics, and ♦*<sup>P</sup>* its independence relation (defined in Sect. 2). Additionally, let <sup>E</sup> denote the pes <sup>U</sup>*P,*♦*<sup>P</sup>* of P extended with a new event ⊥ that causally precedes every event in U*P,*♦*<sup>P</sup>* .

The unfolding E represents the dependency of actions in M*<sup>P</sup>* through the causality and conflict relations between events. By definition of ♦*<sup>P</sup>* we know that for any two events e, e ∈ E:

– If e and e are events from the same thread, then they are either causally related or in conflict.

– If e and e are lock/unlock operations on the same variable, then similarly they are either causally related or in conflict.

This means that the causality/conflict relations between all events of one thread can be tracked using a tree. For every thread of the program we define and maintain a so-called *thread tree*. Each event of the thread has a corresponding node in the tree. A tree node n is the parent of another tree node n iff the event associated with n is the immediate causal predecessor of the event associated with n . That is, the ancestor relation of the tree encodes the causality relations of events in the thread, and the branching of the tree represents conflict. Given two events e, e of the same thread we have that e<e iff ¬(e # e ) iff the tree node of e is an ancestor of the tree node of e .

We apply the same idea to track causality/conflict between acq and rel events. For every lock l ∈ L we maintain a separate *lock tree*, containing a node for each event labelled by either acq, l or rel, l. As before, the ancestor relation in a lock tree encodes the causality relations of all events represented in that tree. Events of type acq/rel have tree nodes in both their lock and thread trees. Events for loc actions are associated to only one node in the thread tree.

This idea gives a procedure to decide a causality/conflict query for two events when they belong to the same thread or modify the same lock. But we still need to decide causality and conflict for other events, e.g., loc events of different threads. Again by construction of ♦*<sup>P</sup>* , the only source of conflict/causality for events are the causality/conflict relations between the causal predecessors of the two. These relations can be summarized by keeping two mappings for each event:

**Definition 7.** *Let* e ∈ E *be an event of* E*. We define the* thread mapping *tmax* : <sup>E</sup> <sup>×</sup> <sup>N</sup> <sup>→</sup> <sup>E</sup> *as the only function that maps every pair* e, i *to the unique* <*-maximal event from thread* i *in* [e]*, or* ⊥ *if* [e] *contains no event from thread* i*. Similarly, the* lock mapping *lmax* : E ×L → E *maps every pair* e, l *to the unique* <*-maximal event* e ∈ [e] *such that* h(e ) *is an action of the form acq*, l *or rel*, l*, or* ⊥ *if no such event exists in* [e]*.*

The information stored by the thread and lock mappings enables us to decide causality and conflict queries for arbitrary pairs of events:

**Theorem 6.** *Let* e, e ∈ E *be two arbitrary events from resp. threads* i *and* i *, with* i = i *. Then* e<e *holds iff* e *tmax* (e , i)*. And* e # e *holds iff there is some* l ∈ L *such that lmax* (e, l) # *lmax* (e , l)*.*

As a consequence of Theorem 6, deciding whether two events are related by causality or conflict reduces to deciding whether two nodes from the *same* lock or thread tree are ancestors.

### **5.2 Computing Causality and Conflict for Tree Nodes**

This section presents an efficient algorithm to decide if two nodes of a tree are ancestors. The algorithm is similar to a search in a skip list [18].

Let N, -, r denote a tree, where <sup>N</sup> is a set of *nodes*, - ⊆ N × N is the *parent relation*, and r ∈ N is the root. Let d(n) be the depth of each node in the tree, with d(r) = 0. A node n is an *ancestor* of n if it belongs to the only path from r to n . Finally, for a node <sup>n</sup> <sup>∈</sup> <sup>N</sup> and some integer <sup>g</sup> <sup>∈</sup> <sup>N</sup> such that g ≤ d(n) let q(n, g) denote the unique ancestor n of n such that d(n ) = g.

Given two *distinct* nodes n, n ∈ N, we need to efficiently decide whether n is an ancestor of n . The key idea is that if d(n) = d(n ), then the answer is clearly negative; and if the depths are different and w.l.o.g. d(n) < d(n ), then we have that n is an ancestor of n iff nodes n and n := q(n , d(n)) are the same node.

To find n from n , a linear traversal of the branch starting from n would be expensive for deep trees. Instead, we propose to use a data structure similar to a skip list. Each node stores a pointer to the parent node *and* also a number of pointers to ancestor nodes at distances <sup>s</sup>1, s2, s3,..., where <sup>s</sup> <sup>∈</sup> <sup>N</sup> is a userdefined *step*. The number of pointers stored at a node n is equal to the number of trailing zeros in the s-ary representation of d(n). For instance, for s := 2 a node at depth 4 stores 2 pointers (apart from the pointer to the parent) pointing to the nodes at depth 4−s<sup>1</sup> = 2 and depth 4−s<sup>2</sup> = 0. Similarly a node at depth 12 stores a pointer to the ancestor (at depth 11) and pointers to the ancestors at depths 10 and 8. With this algorithm computing q(n, g) requires traversing log(d(n) − g) nodes of the tree.

### **5.3 Computing Conflicting Extensions**

We now explain how function cexp(*C* ) in Algorithm 1 works. A call to cexp(*C* ) constructs and returns all events in *cex* (C). The function works only when the

pes being explored is the unfolding of a program <sup>P</sup> under the independence ♦*<sup>P</sup>* . Owing to the properties of U*P,*♦*<sup>P</sup>* , all events in *cex* (C) are labelled by acq actions. Broadly speaking, this is because only the actions from different threads that are co-enabled *and* are dependent create conflicts in U*P,*♦*<sup>P</sup>* . And this is only possible for acq statements. For the same reason, an event labelled by a := i,acq, l exists in *cex* (C) iff there is some event e ∈ C such that h(e) = a.

Function cexp exploits these facts and the lock tree introduced in Sect. 5.1 to compute *cex* (C). Intuitively, it finds every event e labelled by an acq, l statement and tries to "execute" it before the rel, l that happened before e (if there is one). If it can, it creates a new event ˆe with the same label as e.

Function pt(*e*) returns the only immediate causal predecessor of event e in its own thread. For an acq/rel event e, function pm(*e*) returns the parent node of event e in its lock tree (or ⊥ if e is the root). So for an acq event it returns a rel event, and for a rel event it returns an acq event.

# **6 Experimental Evaluation**

We implemented QPOR in a new tool called Dpu (*Dynamic Program Unfolder*, available at https://github.com/cesaro/dpu/releases/tag/v0.5.2). Dpu is a stateless model checker for C programs with POSIX threading. It uses the LLVM infrastructure to parse, instrument, and JIT-compile the program, which is assumed to be data-deterministic. It implements k-partial alternatives (k is an input), optimal POR, and context-switch bounding [6].

Dpu does not use data-races as a source of thread interference for POR. It will not explore two execution orders for the two instructions that exhibit a data-race. However, it can be instructed to detect and report data races found during the POR exploration. When requested, this detection happens for a userprovided percentage of the executions explored by POR.

### **6.1 Comparison to SDPOR**

In this section we investigate the following experimental questions: (a) How does QPOR compare against SDPOR? (b) For which values of k do k-partial alternatives yield optimal exploration?

We use realistic programs that expose complex thread synchronization patterns including a job dispatcher, a multiple-producer multiple-consumer scheme, parallel computation of π, and a thread pool. Complex synchronizations patterns are frequent in these examples, including nested and intertwined critical sections or conditional interactions between threads based on the processed data, and provide means to highlight the differences between POR approaches and drive improvement. Each program contains between 2 and 8 assertions, often ensuring invariants of the used data structures. All programs are safe and have between 90 and 200 lines of code. We also considered the SV-COMP'17 benchmarks, but almost all of them contain very simple synchronization patterns, not representative of more complex concurrent algorithms. On these benchmarks QPOR and SDPOR perform an almost identical exploration, both timeout on exactly the same instances, and both find exactly the same bugs.

In Table 1, we present a comparison between Dpu and Nidhugg [2], an efficient implementation of SDPOR for multithreaded C programs. We run kpartial alternatives with k ∈ {1, 2, 3} and optimal alternatives. The number of SSB executions dramatically decreases as k increases. With k = 3 almost no instance produces SSBs (except MPC(4,5)) and optimality is achieved with <sup>k</sup> = 4. Programs with simple synchronization patterns, e.g., the Pi benchmark, are explored optimally both with k = 1 and by SDPOR, while more complex synchronization patterns require k > 1.

Overall, if the benchmark exhibits many SSBs, the run time reduces as k increases, and optimal exploration is the fastest option. However, when the benchmark contains few SSBs (cf., Mpat, Pi, Poke), <sup>k</sup>-partial alternatives can be slightly faster than optimal POR, an observation inline with previous literature [1]. Code profiling revealed that when the comb is large and contains many solutions, both optimal and non-optimal POR will easily find them, but optimal POR spends additional time constructing a larger comb. This suggests that optimal POR would profit from a lazy comb construction algorithm.

Dpu is faster than Nidhugg in the majority of the benchmarks because it can greatly reduce the number of SSBs. In the cases where both tools explore the


**Table 1.** Comparing QPOR and SDPOR. Machine: Linux, Intel Xeon 2.4 GHz. TO: timeout after 8 min. Columns are: Th: nr. of threads; Confs: maximal configurations; Time in seconds, Memory in MB; SSB: Sleep-set blocked executions. N/A: analysis with lower *k* yielded 0 SSBs.

same set of executions, Dpu is in general faster than Nidhugg because it JITcompiles the program, while Nidhugg interprets it. All the benchmark in Table <sup>1</sup> are data-race free, but Nidhugg cannot be instructed to ignore data-races and will attempt to revert them. Dpu was run with data-race detection disabled. Enabling it will incur in approximatively 10% overhead. In contrast with previous observations [1,2], the results in Table 1 show that SSBs can dramatically slow down the execution of SDPOR.

### **6.2 Evaluation of the Tree-Based Algorithms**

We now evaluate the efficiency of our tree-based algorithms from Sect. 5 answering: (a) What are the average/maximal depths of the thread/lock sequential trees? (b) What is the average depth difference on causality/conflict queries? (c) What is the best step for branch skip lists? We do not compare our algorithms against others because to the best of our knowledge none is available (other than a naive implementation of the mathematical definition of causality/conflict).

We run Dpu with an optimal exploration over 15 selected programs from Table 1, with 380 to 204K maximal configurations in the unfolding. In total, the 15 unfoldings contain 246 trees (150 thread trees and 96 lock trees) with 5.2M nodes. Figure 3 shows the average depth of the nodes in each tree (subfigure a) and the maximum depth of the trees (subfigure b), for each of the 246 trees.

**Fig. 3.** (a), (b) Depths of trees; (c), (d) frequency of depth distances

While the average depth of a node is 22.7, as much as 80% of the trees have a maximum depth of less than 8 nodes, and 90% of them less than 16 nodes. The average of 22.7 is however larger because deeper trees contain proportionally more nodes. The depth of the deepest node of every tree was between 3 and 77.

We next evaluate depth differences in the causality and conflict queries over these trees. Figure 3(a) and (b) respectively show the frequency of various depth distances associated to causality and conflict queries made by optimal POR.

Surprisingly, depth differences are very small for both causality and conflict queries. When deciding causality between events, as much as 92% of the queries were for tree nodes separated by a distance between 1 and 4, and 70% had a difference of 1 or 2 nodes. This means that optimal POR, and specifically the procedure that adds *ex* (C) to the unfolding (which is the main source of causality queries), systematically performs causality queries which are trivial with the proposed data structures. The situation is similar for checking conflicts: 82% of the queries are about tree nodes whose depth difference is between 1 and 4.

These experiments show that most queries on the causality trees require very short walks, which strongly drives to use the data structure proposed in Sect. 5. Finally, we chose a (rather arbitrary) skip step of 4. We observed that other values do not significantly impact the run time/memory consumption for most benchmarks, since the depth difference on causality/conflict requests is very low.

#### **6.3 Evaluation Against the State-of-the-Art on System Code**

We now evaluate the scalability and applicability of Dpu on five multithreaded programs in two Debian packages: *blktrace* [5], a block layer I/O tracing mechanism, and *mafft* [12], a tool for multiple alignment of amino acid or nucleotide sequences. The code size of these utilities ranges from 2K to 40K LOC, and *mafft* is parametric in the number of threads.

We compared Dpu against Maple [24], a state-of-the-art testing tool for multithreaded programs, as the top ranked verification tools from SV-COMP'17 are still unable to cope with such large and complex multithreaded code. Unfortunately we could not compare against Nidhugg because it cannot deal with the (abundant) C-library calls in these programs.

Table 2 presents our experimental results. We use Dpu with optimal exploration and the modified version of Maple used in [22]. To test the effectiveness of both approaches in state space coverage and bug finding, we introduce bugs in 4 of the benchmarks (Add,Dnd,Mdl,pla). For the safe benchmark Blk, we perform exhaustive state-space exploration using Maple's DFS

**Table 2.** Comparing DPU with Maple (same machine). LOC: lines of code; Execs: nr. of executions; R: safe or unsafe. Other columns as before. Timeout: 8 min.


mode. On this benchmark, Dpu outperfors Maple by several orders of magnitude: Dpu explores up to 20K executions covering the entire state space in 10 s, while Maple only explores up to 108 executions in 8 min.

For the remaining benchmarks, we use the random scheduler of Maple, considered to be the best baseline for bug finding [22]. First, we run Dpu to retrieve a bound on the number of random executions to answer whether both tools are able to find the bug within the same number of executions. Maple found bugs in all buggy programs (except for one variant in Add) even though Dpu greatly outperforms and is able to achieve much more state space coverage.

### **6.4 Profiling a Stateless POR**

In order to understand the cost of each component of the algorithm, we profile Dpu on a selection of 7 programs from Table 1. Dpu spends between 30% and 90% of the run time executing the program (65% in average). The remaining time is spent computing alternatives, distributed as follows: adding events to the event structure (15% to 30%), building the spikes of a new comb (1% to 50%), searching for solutions in the comb (less than 5%), and computing conflicting extensions (less than 5%). Counterintuitively, building the *comb* is more expensive than exploring it, even in the optimal case. Filling the spikes seems to be more memory-intensive than exploring the comb, which exploits data locality.

# **7 Conclusion**

We have shown that computing alternatives in an optimal DPOR exploration is NP-complete. To mitigate this problem, we introduced a new approach to compute alternatives in polynomial time, approximating the optimal exploration with a user-defined constant. Experiments conducted on benchmarks including Debian packages show that our implementation outperforms current verification tools and uses appropriate data structures. Our profiling results show that running the program is often more expensive than computing alternatives. Hence, efforts in reducing the number of redundant executions, even if significantly costly, are likely to reduce the overall execution time.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **On the Completeness of Verifying Message Passing Programs Under Bounded Asynchrony**

Ahmed Bouajjani<sup>1</sup>, Constantin Enea1(B), Kailiang Ji<sup>1</sup>, and Shaz Qadeer<sup>2</sup>

<sup>1</sup> IRIF, University Paris Diderot and CNRS, Paris, France {abou,cenea,jkl}@irif.fr <sup>2</sup> Microsoft Research, Redmond, USA qadeer@microsoft.com

**Abstract.** We address the problem of verifying message passing programs, defined as a set of processes communicating through unbounded FIFO buffers. We introduce a bounded analysis that explores a special type of computations, called k-synchronous. These computations can be viewed as (unbounded) sequences of interaction phases, each phase allowing at most k send actions (by different processes), followed by a sequence of receives corresponding to sends in the same phase. We give a procedure for deciding k*-synchronizability* of a program, i.e., whether every computation is equivalent (has the same happens-before relation) to one of its k-synchronous computations. We show that reachability over k-synchronous computations and checking k-synchronizability are both PSPACE-complete.

# **1 Introduction**

Communication with asynchronous message passing is widely used in concurrent and distributed programs implementing various types of systems such as cache coherence protocols, communication protocols, protocols for distributed agreement, device drivers, etc. An asynchronous message passing program is built as a collection of processes running in parallel, communicating asynchronously by sending messages to each other via channels or message buffers. Messages sent to a given process are stored in its entry buffer, waiting for the moment they will be received by the process. Sending messages is not blocking for the sender process, which means that the message buffers are supposed to be of unbounded size.

Such programs are hard to get right. Asynchrony introduces a tremendous amount of new possible interleavings between actions of parallel processes, and makes it very hard to apprehend the effect of all of their computations. Due

This work is supported in part by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 678177).

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 372–391, 2018. https://doi.org/10.1007/978-3-319-96142-2\_23

to this complexity, verifying properties (invariants) of such systems is hard. In particular, when buffers are ordered (FIFO buffers), the verification of invariants (or dually of reachability queries) is undecidable even when each process is finitestate [10].

Therefore, an important issue is the design of verification approaches that avoid considering the full set of computations to draw useful conclusions about the correctness of the considered programs. Several such approaches have been proposed including partial-order techniques, bounded analysis techniques, etc., e.g., [4,6,13,16,23]. Due to the hardness of the problem and its undecidability, these techniques have different limitations: either applicable only when buffers are bounded (e.g., partial-order techniques), or limited in scope, or do not provide any guarantees of termination or insight about the completeness of the analysis.

In this paper, we propose a new approach for the analysis and verification of asynchronous message-passing programs with unbounded FIFO buffers, which provides a decision procedure for checking state reachability for a wide class of programs, and which is also applicable for bounded-analysis in the general case.

We first define a bounding concept for prioritizing the enumeration of program behaviors. This concept is guided by our conviction that the behaviors of well designed programs can be seen as successions of *bounded interaction phases*, each of them being a sequence of send actions (by different processes), followed by a sequence of receive actions (again by different processes) corresponding to send actions belonging to the same interaction phase. For instance, interaction phases corresponding to *rendezvous communications* are formed of a single send action followed immediately by its corresponding receive. More complex interactions are the result of exchanges of messages between processes. For instance two processes can send messages to each other, and therefore their interaction starts with two send actions (in any order), followed by the two corresponding receive actions (again in any order). This exchange schema can be generalized to any number of processes. We say that an interaction phase is k*-bounded*, for a given k > 0, if its number of send actions is less than or equal to k. For instance rendezvous interactions are precisely 1-bounded phases. In general, we call k*exchange* any k-bounded interaction phase. Given k > 0, we consider that a computation is k*-synchronous* if it is a succession of k-exchanges. It can be seen that, in k-synchronous computations the sum of the sizes of all messages buffers is bounded by k. However, as it will be explained later, boundedness of the messages buffers does not guarantee that there is a k such that all computations are k-synchronous.

Then, we introduce a new bounded analysis which for a given k, considers only computations that are *equivalent* to k-synchronous computations. The equivalence relation on computations is based on a notion of *trace* corresponding to a *happens-before* relation capturing the program order (the order of actions in the code of a process) and the precedence order between sends and their corresponding receives. Two computations are equivalent if they have the same trace, i.e., they differ only in the order of causally independent actions. We show that this analysis is PSPACE-complete when processes have a finite number of states.

An important feature of our bounding concept is that it is possible to decide its completeness for systems composed of finite-state processes, but with unbounded message buffers: For any given k, it is possible to decide whether every computation of the program (under the asynchronous semantics) is equivalent to (i.e., has the same trace as) a k-synchronous computation of that program. When this holds, we say that the program is k*-synchronizable*<sup>1</sup>. Knowing that a program is k-synchronizable allows to conclude that an invariant holds for all computations of the program if no invariant violations have been found by its k-bounded exchange analysis. Notice that k-synchronizability of a program *does not* imply that all its behaviours use bounded buffers. Consider for instance a program with two processes, a producer that consists of a loop of sends, and a consumer that consists of a loop of receives. Although there are computations where the entry buffer of the consumer is arbitrarily large, the program is 1-synchronizable because all its computations are equivalent to computations where each message sent by the producer is immediately received by the consumer.

Importantly, we show that checking k-synchronizability of a program, with possibly infinite-state processes, can be reduced in linear time to checking state reachability under the k-synchronous semantics (i.e., without considering all the program computations). Therefore, for finite-state processes, checking ksynchronizability is PSPACE and it is possible to decide invariant properties without dealing with unbounded message buffers when the programs are ksynchronizable (the overall complexity being PSPACE).

Then, a method for verifying asynchronous message passing programs can be defined, based on iterating k-bounded analyses with increasing value of k, starting from k = 1. If for some k, a violation (i.e., reachability of an error state) is detected, then the iteration stops and the conclusion is that the program is not correct. On the other hand, if for some k, the program is shown to be k-synchronizable and no violations have been found, then again the iteration terminates and the conclusion is that the program is correct.

However, it is possible that the program is *not* k-synchronizable for any k. In this case, if the program is correct then the iteration above will not terminate. Thus, an important issue is to determine whether a program is *synchronizable*, i.e., *there exists a* k *such that the program is* k*-synchronizable*. This problem is hard, and we believe that it is undecidable, but we do not have a formal proof.

We have applied our theory to a set of nontrivial examples, two of them being presented in Sect. 2. All the examples are synchronizable, which confirms our conviction that non-synchronizability should correspond to an ill-designed system (and therefore it should be reported as an anomaly).

An extended version of this paper with missing proofs can be found at [9].

<sup>1</sup> A different notion of synchronizability has been defined in [4] (see Sect. 8).

# **2 Motivating Examples**

We provide in this section examples illustrating the relevance and the applicability of our approach. Figure 1 shows a *commit protocol* allowing a client to update a memory that is replicated in two processes, called *nodes*. The access to the nodes is controlled by a manager. Figure 2 shows an execution of this protocol. This system is 1-synchronizable, i.e., every execution is equivalent to one where only rendezvous communication is used. Intuitively, this holds because mutually interacting components are never in the situation where messages sent from one to the other are crossing messages sent in the other direction (i.e., the components are "talking" to each other at the same time). For instance, the execution in Fig. 2 is 1-synchronizable because its *conflict graph* (shown in the same figure) is acyclic. Nodes in the conflict graph are matching send-receive pairs (numbered from 1 to 6 in the figure), and edges correspond to the program order between actions in these pairs. The label of an edge records whether the actions related by program order are sends or receives, e.g., the edge from 1 to 2 labeled by RS represents the fact that the receive of the send-receive pair 1

**Fig. 1.** A distributed commit protocol. Each process is defined as a labeled transition system. Transitions are labeled by send and receive actions, e.g., send(c, m, update) is a send from the client c to the manager m with payload update. Similarly, rec(c, OK) denotes process c receiving a message OK.

**Fig. 2.** An execution of the distributed commit protocol and its conflict graph.

is before the send of the send-receive pair 2, in program order. For the moment, these labels should be ignored, their relevance will be discussed in Sect. 5. The conflict graph being acyclic means that matching pairs of send-receive actions are "serializable", which implies that this execution is equivalent to one where every send is immediately followed by the matching receive (as in rendezvous communication).

Although the message buffers are bounded in all the computations of the commit protocol, this is not true for every 1-synchronizable system. There are asynchronous computations where buffers have an arbitrarily big size, which are equivalent to synchronous computations. This is illustrated by a (family of) computations shown in Fig. 4a of the system modeling an elevator described in Fig. 3 (a simplified version of the system described in [14]). This system consists of three processes: User models the user of the elevator, Elevator models the elevator's controller, and Door models the elevator's door which reacts to commands received from the controller. The execution in Fig. 4a models an interaction where the user sends an unbounded number of requests for closing the door, which generates an unbounded number of messages in the entry buffer of Elevator. These computations are 1-synchronizable since they are equivalent to a 1-synchronous computation where Elevator receives immediately every message sent by User. This is witnessed by the acyclicity of the conflict graph of this computation (shown on the right of the same figure). It can be checked that the elevator system without the dashed edge is a 1-synchronous system.

Consider now a slightly different version of the elevator system where the transition from Stopping2 to Opening2 is moved to target Opening1 instead (see the dashed transition in Fig. 3). It can be seen that this version reaches exactly the same set of configurations (tuples of process local states) as the previous one. Indeed, modifying that transition enables Elevator to send a message open to Door, but the latter can only be at StopDoor, OpenDoor, or ResetDoor at this point, and therefore it can (maybe after sending doorStopped and doorOpened) receive at state ResetDoor the message open. However, receiving this message doesn't change Door's state, and the set of reachable configurations of the system remains the same. This version of the system is not 1-synchronizable as it is shown in Fig. 4b: once the doorStopped message sent by Door is received by Elevator<sup>2</sup>, these two processes can send messages to each other at the same time (the two send actions happen before the corresponding receives). This mutual interaction consisting of 2 parallel send actions is called a 2*-exchange* and it is witnessed by the cycle of size 2 in the execution's conflict graph (shown on the right of Fig. 4b). In general, it can be shown that every execution of this version of the elevator system has a conflict graph with cycles of size at most 2, which implies that it is 2-synchronizable (by the results in Sect. 5).

<sup>2</sup> Door sends the message from state StopDoor, and Elevator is at state Stopping2 before receiving the message.

# **3 Message Passing Systems**

We define a message passing system as the composition of a set of processes that exchange messages, which can be stored in FIFO buffers before being received (we assume one buffer per process, storing incoming messages from all the other processes). Each process is described as a state machine that evolves by executing send or receive actions. An execution of such a system can be represented abstractly using a partially-ordered set of events, called a *trace*. The partial order in a trace represents the causal relation between events. We show that these systems satisfy *causal delivery*, i.e., the order in which messages are received by a process is consistent with the causal relation between the corresponding sendings.

**Fig. 3.** A system modeling an elevator.

We fix sets P and V of process ids and message payloads, and sets S = {send(p, q, v) : p, q <sup>∈</sup> <sup>P</sup>, v <sup>∈</sup> <sup>V</sup>} and <sup>R</sup> <sup>=</sup> {rec(q, v) : <sup>q</sup> <sup>∈</sup> <sup>P</sup>, v <sup>∈</sup> <sup>V</sup>} of *send actions* and *receive actions*. Each send send(p, q, v) combines two process ids p, q denoting the sender and the receiver of the message, respectively, and a message payload v. Receive actions specify the process q receiving the message, and the message payload <sup>v</sup>. The process executing an action <sup>a</sup> <sup>∈</sup> <sup>S</sup> <sup>∪</sup> <sup>R</sup> is denoted proc(a), i.e., proc(a) = <sup>p</sup> for all <sup>a</sup> = send(p, q, v) or <sup>a</sup> = rec(p, v), and the destination <sup>q</sup> of a send <sup>s</sup> = send(p, q, v) <sup>∈</sup> <sup>S</sup> is denoted dest(s). The set of send, resp., receive, actions <sup>a</sup> of process <sup>p</sup>, i.e., with proc(a) = <sup>p</sup>, is denoted by <sup>S</sup>*p*, resp., <sup>R</sup>*p*.

<sup>A</sup> *message passing system* is a tuple <sup>S</sup> = ((L*p*, δ*p*, l<sup>0</sup> *<sup>p</sup>*) <sup>|</sup> <sup>p</sup> <sup>∈</sup> <sup>P</sup>) where <sup>L</sup>*<sup>p</sup>* is the set of local states of process p, δ*<sup>p</sup>* ⊆ L × (S*<sup>p</sup>* ∪ R*p*) × L is a transition relation describing the evolution of process p, and l 0 *<sup>p</sup>* is the initial state of process p. Examples of message passing systems can be found in Figs. 1 and 3.

We fix a set <sup>M</sup> of message identifiers, and the sets <sup>S</sup>*id* <sup>=</sup> {s*<sup>i</sup>* : <sup>s</sup> <sup>∈</sup> S, i <sup>∈</sup> <sup>M</sup>} and <sup>R</sup>*id* <sup>=</sup> {r*<sup>i</sup>* : <sup>r</sup> <sup>∈</sup> R, i <sup>∈</sup> <sup>M</sup>} of indexed actions. Message identifiers are used to pair send and receive actions. We denote the message id of an indexed send/receive action <sup>a</sup> by msg(a). Indexed send and receive actions <sup>s</sup> <sup>∈</sup> <sup>S</sup>*id* and <sup>r</sup> <sup>∈</sup> <sup>R</sup>*id* are *matching*, written <sup>s</sup> −<sup>r</sup>, when msg(s) = msg(r).

A configuration c = *l*, *b* is a vector *l* of local states along with a vector *b* of message buffers (sequences of message payloads tagged with message identifiers). The transition relation *<sup>a</sup>* −→ (with label a ∈ S*id* ∪ R*id*) between configurations is defined as expected. Every send action enqueues the message into the destination's buffer, and every receive dequeues a message from the buffer. An execution of a system S under the asynchronous semantics is a sequence of indexed actions which corresponds to applying a sequence of transitions from the initial configuration (where processes are in their initial states and the buffers are empty). Let asEx(S) denote the set of these executions. Given an execution e, a send action s in e is called an *unmatched send* when e contains no receive action r such that s −r. An execution e is called *matched* when it contains no unmatched send.

**Traces.** Executions are represented using traces which are sets of indexed actions together with a *program order* relating every two actions of the same process and a *source* relation relating a send with the matching receive (if any).

**Fig. 4.** Executions of the elevator.

Formally, a *trace* is a tuple <sup>t</sup> = (A, po, src) where <sup>A</sup> <sup>⊆</sup> <sup>S</sup>*id* <sup>∪</sup> <sup>R</sup>*id*, po <sup>⊆</sup> <sup>A</sup><sup>2</sup> defines a total order between actions of the same process, and src ⊆ S*id* ×R*id* is a relation s.t. src(a, a ) iff <sup>a</sup> −a . The *trace* tr(e) of an execution e is (A, po, src) where A is the set of all actions in e, po(a, a ) iff proc(a) = proc(a ) and a occurs before a in e, and src(a, a ) iff <sup>a</sup> −a . Examples of traces can be found in Figs. 2 and 4. The union of po and src is acyclic. Let asTr(S) = {tr(e) : e ∈ asEx(S)} be the set of traces of S under the asynchronous semantics.

Traces abstract away the order of non-causally related actions, e.g., two sends of different processes that could be executed in any order. Two executions have the same trace when they only differ in the order between such actions. Formally, given an execution e = e<sup>1</sup> ·a·a ·e<sup>2</sup> with tr(e)=(A, po, src), where e1, e<sup>2</sup> ∈ (S*id*∪ R*id*)<sup>∗</sup> and a, a ∈ S*id*∪R*id*, we say that e = e1·a ·a·e<sup>2</sup> is derived from e by a *valid swap* iff (a, a ) ∈ po∪src. A permutation e of an execution e is *conflict-preserving* when e can be derived from e through a sequence of valid swaps. For simplicity, whenever we use the term permutation we mean conflict-preserving permutation. For instance, a permutation of send1(p1, q, ) send2(p2, q, ) rec1(q, ) rec2(q, ) is send1(p1, q, ) rec1(q, ) send2(p2, q, ) rec2(q, ) and a permutation of the execution send1(p1, q1, ) send2(p2, q2, ) rec2(q2, ) rec1(q1, ) is send1(p1, q1, ) rec1(q1, ) send2(p2, q2, ) rec2(q2, ).

Note that the set of executions having the same trace are permutations of one another. Also, a system S cannot distinguish between permutations of executions or equivalently, executions having the same trace.

**Causal Delivery.** The asynchronous semantics ensures a property known as *causal delivery*, which intuitively, says that the order in which messages are received by a process q is consistent with the "causal" relation between them. Two messages are causally related if for instance, they were sent by the same process p or one of the messages was sent by a process p after the other one was received by the same process p. This property is ensured by the fact that the message buffers have a FIFO semantics and a sent message is instantaneously enqueued in the destination's buffer. For instance, the trace (execution) on the left of Fig. 5 satisfies causal delivery. In particular, the messages v1 and v3 are causally related, and they are received in the same order by q2. On the right of Fig. 5, we give a trace where the messages v<sup>1</sup> and v<sup>3</sup> are causally related, but received in a different order by q2, thus violating causal delivery. This trace is not valid because the message v1 would be enqueued in the buffer of q2 before send(p, q1, v2) is executed and thus, before send(q1, q2, v3) as well.

**Fig. 5.** A trace satisfying causal delivery (on the left) and a trace violating causal delivery (on the right).

**Fig. 6.** An execution of the 1-synchronous semantics.

Formally, for a trace t = (A, po, src), the transitive closure of po∪src, denoted by *<sup>t</sup>*, is called the *causal relation* of t. For instance, for the trace t on the left of Fig. 5, we have that send(p, q2, v1) *<sup>t</sup>* send(q1, q2, v3). A trace t satisfies *causal delivery* if for every two send actions s<sup>1</sup> and s<sup>2</sup> in A,

$$\begin{aligned} (s\_1 \curvearrowright\_t s\_2 \land \mathsf{dest}(s\_1) = \mathsf{dest}(s\_2)) & \implies (\exists r\_2 \in A. \ s\_2 \mapsto r\_2) \lor \\ (\exists r\_1, r\_2 \in A. \ s\_1 \mapsto r\_1 \land s\_2 \mapsto r\_2 \land (r\_2, r\_1) \notin po) \end{aligned}$$

It can be easily proved that every trace t ∈ asTr(S) satisfies causal delivery.

### **4 Synchronizability**

We define a property of message passing systems called k*-synchronizability* as the equality between the set of traces generated by the asynchronous semantics and the set of traces generated by a particular semantics called k*-synchronous*.

The k-synchronous semantics uses an extended version of the standard rendez-vous primitive where more than one process is allowed to send a message and a process can send multiple messages, but all these messages must be received before being allowed to send more messages. This primitive is called k*-exchange* if the number of sent messages is at most k. For instance, the execution send1(p1, p2, ) send2(p2, p1, ) rec1(p2, ) rec2(p1, ) is an instance of a 2-exchange. To ensure that the k-synchronous semantics is prefix-closed (if it admits an execution, then it admits all its prefixes), we allow messages to be dropped during a k-exchange transition. For instance, the prefix of the previous execution without the last receive (rec2(p1, )) is also an instance of a 2-exchange. The presence of unmatched send actions must be constrained in order to ensure that the set of executions admitted by the k-synchronous semantics satisfies causal delivery. Consider for instance, the sequence of 1-exchanges in Fig. 6, a 1-exchange with one unmatched send, followed by two 1-exchanges with matching pairs of send/receives. The receive action (rec(q2, v3)) pictured as an empty box needs to be disabled in order to exclude violations of causal delivery. To this, the semantics tracks for each process p a set of processes B(p) from which it is forbidden to receive messages. For the sequence of 1-exchanges in Fig. 6, the unmatched send(p, q2, v1) disables any receive by q2 of a message sent by p (otherwise, it will be even a violation of the FIFO semantics of q2's buffer). Therefore, the first 1-exchange results in B(q2) = {p}. The second 1-exchange (the message from p to q1) forbids q2 to receive any message from q1. Otherwise, this message will be necessarily causally related to v1, and receiving it will lead to a violation of causal delivery. Therefore, when reaching send(q1, q2, v3) the receive rec(q2, v3) is disabled because q1 ∈ B(q2).

$$\begin{array}{c} \mathsf{k}\text{-}\text{XCHANCE} \\ \qquad \qquad \qquad \qquad \qquad \begin{array}{l} e \in S\_{id}^{\*} \cdot R\_{id}^{\*} \qquad \quad |e| \leq 2 \cdot k \\ \quad \quad \forall s \left( \begin{array}{c} \mathsf{k}' \cdot \mathsf{b} \right), \text{ for some } \mathsf{b} \\ \quad \quad \forall s, r \in e \cdot s \mathrel{\scalebox{1.0pt}{ $\langle\rangle$ }} \; \exists\begin{array}{c} (\begin{array}{c} \exists r \left( \begin{array}{c} \mathsf{c} \ \mathsf{b} \ \mathsf{c} \end{array} \land \begin{array}{c} \begin{array}{c} \begin{array}{c} \mathsf{c} \ \mathsf{b} \end{array} \land \begin{array}{c} \begin{array}{c} \begin{array}{c} \mathsf{c} \ \mathsf{b} \end{array} \land \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \mathsf{c} \end{array} \land \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \end{array} \land \begin{array}{c} \end{array} \land \begin{array}{c} \end{array} \land \end{array} \end{array} \end{array} \end{array} \right) \\ \end{array} \right) \\ \end{array} \right) \\ \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \begin{array}{c} \begin{array}{c} \end{$$

**Fig. 7.** The synchronous semantics. Above,  is a vector where all the components are -, and *<sup>e</sup>* −→ is the transition relation of the asynchronous semantics.

Formally, a configuration c = (*l*, B) in the synchronous semantics is a vector *<sup>l</sup>* of local states together with a function <sup>B</sup> : <sup>P</sup> <sup>→</sup> <sup>2</sup><sup>P</sup>. The transition relation <sup>⇒</sup>*<sup>k</sup>* is defined in Fig. 7. A <sup>k</sup>-exchange transition corresponds to a sequence of transitions of the asynchronous semantics starting from a configuration with empty buffers. The sequence of transitions is constrained to be a sequence of at most k sends followed by a sequence of receives. The receives are enabled depending on previous unmatched sends as explained above, using the function B. The semantics defined by ⇒*<sup>k</sup>* is called the k-synchronous semantics.

Executions and traces are defined as in the case of the asynchronous semantics, using ⇒*<sup>k</sup>* for some fixed k instead of →. The set of executions, resp., traces, of S under the k-synchronous semantics is denoted by sEx*k*(S), resp., sTr*k*(S). The executions in sEx*k*(S) and the traces in sTr*k*(S) are called k-synchronous.

An execution e such that tr(e) is k-synchronous is called k-synchronizable. We omit k when it is not important. The set of executions generated by a system S under the k-synchronous semantics is prefix-closed. Therefore, the set of its k-synchronizable executions is prefix-closed as well. Also, k-synchronizable and k-synchronous executions are undistinguishable up to permutations.

**Definition 1.** *A message passing system* S *is called* k-synchronizable *when* asTr(S) = sTr*k*(S)*.*

It can be easily proved that k-synchronizable systems reach exactly the same set of local state vectors under the asynchronous and the k-synchronous semantics. Therefore, any assertion checking or invariant checking problem for a k-synchronizable system S can be solved by considering the k-synchronous semantics instead of the asynchronous one. This holds even for the problem of detecting deadlocks. Therefore, all these problems become decidable for finitestate k-synchronizable systems, whereas they are undecidable in the general case (because of the FIFO message buffers).

### **5 Characterizing Synchronous Traces**

**Fig. 8.** A trace and its conflict graph.

We give a characterization of the traces generated by the k-synchronous semantics that uses a notion of *conflictgraph* similar to the one used in conflict serializability [27]. The nodes of the conflict graph correspond to pairs of matching actions (a send and a receive) or to unmatched sends, and the edges represent the program order relation between the actions represented by these nodes.

For instance, an execution with an acyclic conflict graph, e.g., the execution in Fig. 2, is "equivalent" to an execution where every receive immediately follows the matching send. Therefore, it is an execution of the 1 synchronous semantics. For arbitrary values of k, the conflict graph may contain cycles, but of a particular form. For instance, traces of the 2-synchronous semantics may

contain a cycle of size 2 like the one in Fig. 4(b). More generally, we show that the conflict graph of a k-synchronous trace cannot contain cycles of size strictly bigger than k. However, this class of cycles is not sufficient to precisely characterize the k-synchronous traces. Consider for instance the trace on top of Fig. 8. Its conflict-graph contains a cycle of size 4 (shown on the bottom), but the trace is not 4-synchronous. The reason is that the messages tagged by 1 and 4 must be sent during the same exchange transition, but receiving message 4 needs that the message 3 is sent after 2 is received. Therefore, it is not possible to schedule all the send actions before all the receives. Such scenarios correspond to cycles in the conflict graph where at least one receive is before a send in the program order (witnessed by the edge labeled by RS). We show that excluding such cycles, in addition to cycles of size strictly bigger than k, is a precise characterization of k-synchronous traces.

The *conflict-graph* of a trace t = (A, po, src) is the labeled directed graph CG*<sup>t</sup>* = V,E, *E* where: (1) the set of nodes V includes one node for each pair of matching send and receive actions, and each unmatched send action in t, and (2) the set of edges E is defined by: (v, v ) ∈ E iff there exist actions a ∈ act(v) and a ∈ act(v ) such that (a, a ) ∈ po (where act(v) is the set of actions of trace t corresponding to the graph node v). The label of the edge (v, v ) records whether a and a are send or receive actions, i.e., for all X, Y ∈ {S, R}, XY ∈ (v, v ) iff a ∈ X*id* and a ∈ Y*id*.

A direct consequence of previous results on conflict serializability [27] is that a trace is 1-synchronous whenever its conflict-graph is acyclic. A cycle of a conflict graph CG*<sup>t</sup>* is called *bad* when it contains an edge labeled by RS. Otherwise, it is called *good*. The following result is a characterization of k-synchronous traces.

**Theorem 1.** *A trace* t *satisfying causal delivery is* k*-synchronous iff every cycle in its conflict-graph is good and of size at most* k*.*

Theorem 1 can be used to define a runtime monitoring algorithm for ksynchronizability checking. The monitor records the conflict-graph of the trace produced by the system and checks whether it contains some bad cycle, or a cycle of size bigger than k. While this approach requires dealing with unbounded message buffers, the next section shows that this is not necessary. Synchronizability violations, if any, can be exposed by executing the system under the *synchronous* semantics.

# **6 Checking Synchronizability**

We show that checking k-synchronizability can be reduced to a reachability problem under the k*-synchronous* semantics (where message buffers are bounded). This reduction holds for arbitrary, possibly infinite-state, systems. More precisely, since the set of (asynchronous) executions of a system is prefix-closed, if a system S admits a synchronizability violation, then it also admits a *borderline* violation, for which every strict prefix is synchronizable. We show that every *borderline* violation can be "simulated"<sup>3</sup> by the synchronous semantics of an instrumentation of S where the receipt of exactly one message is delayed (during every execution). We describe a monitor that observes executions of the instrumentation (under the synchronous semantics) and identifies synchronizability violations (there exists a run of this monitor that goes to an error state whenever such a violation exists).

### **6.1 Borderline Synchronizability Violations**

For a system S, a violation e to k-synchronizability is called *borderline* when every strict prefix of e is k-synchronizable. Figure 9(a) gives an example of a borderline violation to 1-synchronizability (it is the same execution as in Fig. 4(b)).

We show that every borderline violation e ends with a receive action and this action is included in every cycle of CG*tr*(*e*) that is bad or exceeds the bound k. Given a cycle c = v, v1,...,v*n*, v of a conflict graph CG*t*, the node v is called a *critical* node of c when (v, v1) is an SX edge with X ∈ {S, R} and (v*n*, v) is an Y R edge with Y ∈ {S, R}.

**Lemma 1.** *Let* e *be a borderline violation to* k*-synchronizability of a system* S*. Then,* e = e · r *for some* e ∈ (S*id* ∪ R*id*)<sup>∗</sup> *and* r ∈ R*id. Moreover, the node* v *of* CG*tr*(*e*) *representing* r *(and the corresponding send) is a critical node of every cycle of* CG*tr*(*e*) *which is bad or of size bigger than* k*.*

### **6.2 Simulating Borderline Violations on the Synchronous Semantics**

Let S be a system obtained from S by "delaying" the reception of exactly one nondeterministically chosen message: S contains an additional process π and exactly one message sent by a process in S is non-deterministically redirected

<sup>3</sup> We refer to the standard notion of (stuttering) simulation where one system mimics the transitions of the other system.

**Fig. 9.** A borderline violation to 1-synchronizability.

to π<sup>4</sup>, which sends it to the original destination at a later time<sup>5</sup>. We show that the synchronous semantics of S "simulates" a permutation of every borderline violation of S. Figure 9(b) shows the synchronous execution of S that corresponds to the borderline violation in Fig. 9(a). It is essentially the same except for delaying the reception of doorOpened by sending it to <sup>π</sup> who relays it to the elevator at a later time.

The following result shows that the k-synchronous semantics of S "simulates" all the borderline violations of S, modulo permutations.

**Lemma 2.** *Let* e = e<sup>1</sup> · send*i*(p, q, v) · e<sup>2</sup> · rec*i*(q, v) *be a borderline violation to* k*-synchronizability of* S*. Then,* sEx*k*(S ) *contains an execution* e *of the form:*

$$e' = e'\_1 \cdot \text{send}\_i(p, \pi, (q, v)) \cdot \text{rec}\_i(\pi, (q, v)) \cdot e'\_2 \cdot \text{send}\_j(\pi, q, v) \cdot \text{rec}\_j(q, v)$$

*such that* e <sup>1</sup> · send*i*(p, q, v) · e <sup>2</sup> *is a permutation of* e<sup>1</sup> · send*i*(p, q, v) · e2*.*

Checking k-synchronizability for S on the system S would require that every (synchronous) execution of S can be transformed to an execution of S by applying an homomorphism σ where the send/receive pair with destination π is replaced with the original send action and the send/receive pair initiated by π is replaced with the original receive action (all the other actions are left unchanged). However, this is not true in general. For instance, S may admit an execution send*i*(p, π,(q, v))·rec*i*(π,(q, v))·send*<sup>j</sup>* (p, q, v )·rec*<sup>j</sup>* (q, v )·send*<sup>i</sup>*- (π, q, v)· rec*<sup>i</sup>*- (q, v) where a message sent after the one redirected to π is received earlier, and the two messages were sent by the same process p. This execution is possible under the 1-synchronous semantics of S . Applying the homomorphism σ, we get the execution send*i*(p, q, v) · send*<sup>j</sup>* (p, q, v ) · rec*<sup>j</sup>* (q, v ) · rec*i*(q, v) which violates causal delivery and therefore, it is not admitted by the asynchronous semantics

<sup>4</sup> Meaning that every transition labeled by a send action send(p, q, v) is doubled by a transition labeled by send(p, π, (q, v)), and such a send to π is enabled only once throughout the entire execution.

<sup>5</sup> The process π stores the message (q, v) it receives in its state and has one transition where it can send v to the original destination q.

of S. Our solution to this problem is to define a monitor M*causal* , i.e., a process which reads every transition label in the execution and advances its local state, which excludes such executions of S when run under the synchronous semantics, i.e., it blocks the system S whenever applying some transition would lead to an execution which, modulo the homomorphism σ, is a violation of causal delivery. This monitor is based on the same principles that we used to exclude violations of causal delivery in the synchronous semantics in the presence of unmatched sends (the component B from a synchronous configuration).

### **6.3 Detecting Synchronizability Violations**

We complete the reduction of checking k-synchronizability to a reachability problem under the k-synchronous semantics by describing a monitor M*viol*(k), which observes executions in the k-synchronous semantics of S ||M*causal* and checks whether they represent violations to k-synchronizability; M*viol*(k) goes to an error state whenever such a violation exists.

Essentially, M*viol*(k) observes the sequence of k-exchanges in an execution and tracks a conflict graph cycle, if any, interpreting send*i*(p, π,(q, v)) · rec*i*(π,(q, v)) as in the original system S, i.e., as send*i*(p, q, v), and send*i*(π, q, v)· rec*i*(q, v) as rec*i*(q, v). By Lemma 2, every cycle that is a witness for *non* ksynchronizability includes the node representing the pair send*i*(p, q, v), rec*i*(q, v). Moreover, the successor of this node in the cycle represents an action that is executed by p and the predecessor an action executed by q. Therefore, the monitor searches for a conflict-graph path from a node representing an action of p to a node representing an action of q. Whenever it finds such a path it goes to an error state.

Figure 10 lists the definition of M*viol*(k) as an abstract state machine. By the construction of S , we assume w.l.o.g., that both the send to π and the send from π are executed in isolation as an instance of 1-exchange. When observing the send to π, the monitor updates the variable conflict, which in general stores the process executing the last action in the cycle, to p. Also, a variable count, which becomes 0 when the cycle has strictly more than k nodes, is initialized to k. Then, for every k-exchange transition in the execution, M*viol*(k) non-deterministically picks pairs of matching send/receive or unmatched sends to continue the conflict-graph path, knowing that the last node represents an action of the process stored in conflict. The rules for choosing pairs of matching send/receive to advance the conflict-graph path are pictured on the right of Fig. 10 (advancing the conflict-graph path with an unmatched send doesn't modify the value of conflict, it just decrements the value of count). There are two cases depending on whether the last node in the path conflicts with the send or the receive of the considered pair. One of the two processes involved in this pair of send/receive equals the current value of conflict. Therefore, conflict can either remain unchanged or change to the value of the other process. The variable lastIsRec records whether the current conflict-graph path ends in a conflict due to a receive action. If it is the case, and the next conflict is between

**Fig. 10.** The monitor <sup>M</sup>*viol* (k). <sup>B</sup> is the set of Booleans and <sup>N</sup> is the set of natural numbers. Initially, conflict is <sup>⊥</sup>, while lastIsRec and sawRS are false.

this receive and a send, then sawRS is set to true to record the fact that the path contains an RS labeled edge (leading to a potential bad cycle).

When π sends its message to q, the monitor checks whether the conflict-graph path it discovered ends in a node representing an action of q. If this is the case, this path together with the node representing the delayed send forms a cycle. Then, if sawRS is true, then the cycle is bad and if count reached the value 0, then the cycle contains more than k nodes. In both cases, the current execution is a violation to k-synchronizability.

The set of executions in the k-synchronous semantics of S composed with M*causal* and M*viol*(k), in which the latter goes to an error state, is denoted by S *<sup>k</sup>* ||M*causal* || ¬M*viol*(k).

**Theorem 2.** *For a given* k*, a system* S *is* k*-synchronizable iff the set of executions* S *<sup>k</sup>* ||M*causal* || ¬M*viol*(k) *is empty.*

Given a system S, an integer k, and a local state l, *the reachability problem under the* k*-synchronous semantics* asks whether there exists a k-synchronous execution of <sup>S</sup> reaching a configuration (*l*, B) with <sup>l</sup> <sup>=</sup> *<sup>l</sup><sup>p</sup>* for some <sup>p</sup> <sup>∈</sup> <sup>P</sup>. Theorem 2 shows that checking k-synchronizability can be reduced to a reachability problem under the k-synchronous semantics. This reduction holds for arbitrary (infinite-state) systems, which implies that k-synchronizability can be checked using the existing assertion checking technology. Moreover, for finite-state systems, where each process has a finite number of local states (message buffers can still be unbounded), it implies that checking this property is PSPACE-complete. **Theorem 3.** *For a finite-state system* S*, the reachability problem under the* k*synchronous semantics and the problem of checking* k*-synchronizability of* S *are decidable and PSPACE-complete.*

# **7 Experimental Evaluation**


**Fig. 11.** Experimental results.

As a proof of concept, we have applied our procedure for checking k-synchronizability to a set of examples extracted from the distribution of the P language<sup>6</sup>. Two-phase commit and Elevator are presented in Sect. 2, German is a model of the cachecoherence protocol with the same name, OSR is a model of a device driver, and

Replication Storage is a model of a protocol ensuring eventual consistency of a replicated register. These examples cover common message communication patterns that occur in different domains: distributed systems (Two-phase commit, Replication storage), device drivers (Elevator, OSR), cache-coherence protocols (German). We have rewritten these examples in the Promela language and used the Spin model checker<sup>7</sup> for discharging the reachability queries. For a given program, its k-synchronous semantics and the monitors defined in Sect. 6 are implemented as ghost code. Finding a conflict-graph cycle which witnesses non k-synchronizability corresponds to violating an assertion.

The experimental data is listed in Fig. 11: Proc, resp., Loc, is the number of processes, resp., the number of lines of code (loc) of the original program, k is the *minimal* integer for which the program is k-synchronizable, and Time gives the number of minutes needed for this check. The ghost code required to check k-synchronizability takes 250 lines of code in average.

# **8 Related Work**

Automatic verification of asynchronous message passing systems is undecidable in general [10]. A number of decidable subclasses has been proposed. The class of systems, called *synchronizable* as well, in [4], requires that a system generates the same sequence of send actions when executed under the asynchronous semantics as when executed under a synchronous semantics based on rendezvous communication. These systems are all 1-synchronizable, but the inclusion is strict (the 1-synchronous semantics allows unmatched sends). The techniques proposed in [4] to check that a system is synchronizable according to their definition cannot be extended to k-synchronizable systems. Other classes of systems that are 1-synchronizable have been proposed in the context of session types,

<sup>6</sup> Available at https://github.com/p-org.

<sup>7</sup> Available at http://spinroot.com.

e.g., [12,20,21,26]. A sound but incomplete proof method for distributed algorithms that is based on a similar idea of avoiding reasoning about all program computations is introduced in [3]. Our class of synchronizable systems differs also from classes of communicating systems that restrict the type of communication, e.g., lossy-communication [2], half-duplex communication [11], or the topology of the interaction, e.g., tree-based communication in concurrent pushdowns [19,23].

The question of deciding if all computations of a communicating system are equivalent (in the language theoretic sense) to computations with bounded buffers has been studied in, e.g., [17], where this problem is proved to be undecidable. The link between that problem and our synchronizability problem is not (yet) clear, mainly because non synchronizable computations may use bounded buffers.

Our work proposes a solution to the question of defining adequate (in terms of coverage and complexity) parametrized bounded analyses for message passing programs, providing the analogous of concepts such as context-bounding or delay-bounding defined for shared-memory concurrent programs. Bounded analyses for concurrent systems was initiated by the work on bounded-context switch analysis [25,28,29]. For shared-memory programs, this work has been extended to unbounded threads or larger classes of behaviors, e.g., [8,15,22,24]. Few bounded analyses incomparable to ours have been proposed for message passing systems, e.g., [6,23]. Contrary to our work, these works on bounded analyses in general do not propose decision procedures for checking if the analysis is complete (covers all reachable states). The only exception is [24], which concerns shared-memory.

Partial-order reduction techniques, e.g., [1,16], allow to define equivalence classes on behaviors, based on notions of action independence and explore (ideally) only one representative of each class. This has lead to efficient algorithmic techniques for enhanced model-checking of concurrent shared-memory programs that consider only a subset of relevant action interleavings. In the worst case, these techniques will still need to explore all of the interleavings. Moreover, these techniques are not guaranteed to terminate when the buffers are unbounded.

The work in [13] defines a particular class of schedulers, that roughly, prioritize receive actions over send actions, which is complete in the sense that it allows to construct the whole set of reachable states. Defining an analysis based on this class of schedulers has the same drawback as partial-order reductions, in the worst case, it needs to explore all interleavings, and termination is not guaranteed.

The approach in this work is related to robustness checking [5,7]. The general paradigm is to decide that a program has the same behaviors under two semantics, one being weaker than the other, by showing a polynomial reduction to a state reachability problem under the stronger semantics. For instance, in our case, the class of message passing programs with unbounded FIFO channels is Turing powerful, but still, surprisingly, k-synchronizability of these programs is decidable and PSPACE-complete. The results in [5,7] cannot be applied in our context: the class of programs and their semantics are different, and the corresponding robustness checking algorithms are based on distinct concepts and techniques.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Constrained Dynamic Partial Order Reduction**

Elvira Albert<sup>1</sup> , Miguel G´omez-Zamalloa1(B) , Miguel Isabel<sup>1</sup> , and Albert Rubio<sup>2</sup>

> <sup>1</sup> Complutense University of Madrid, Madrid, Spain mzamalloa@fdi.ucm.es <sup>2</sup> Universitat Polit`ecnica de Catalunya, Barcelona, Spain

**Abstract.** The cornerstone of dynamic partial order reduction (DPOR) is the notion of *independence* that is used to decide whether each pair of concurrent events *p* and *t* are in a race and thus both *p* · *t* and *t* · *p* must be explored. We present *constrained* dynamic partial order reduction (CDPOR), an extension of the DPOR framework which is able to avoid redundant explorations based on the notion of *conditional independence*—the execution of *p* and *t* commutes only when certain *independence constraint*s (ICs) are satisfied. ICs can be declared by the programmer, but importantly, we present a novel SMT-based approach to automatically synthesize ICs in a static pre-analysis. A unique feature of our approach is that we have succeeded to exploit ICs within the state-of-the-art DPOR algorithm, achieving *exponential* reductions over existing implementations.

# **1 Introduction**

Partial Order Reduction (POR) is based on the idea that two interleavings can be considered equivalent if one can be obtained from the other by swapping adjacent, non-conflicting *independent* execution steps. Such equivalence class is called a Mazurkiewicz trace, and POR guarantees that it is sufficient to explore one interleaving per equivalence class. Early POR algorithms [8,10,20] relied on static over-approximations to detect possible *future* conflicts. The Dynamic-POR (DPOR) algorithm, introduced by Godefroid [9] in 2005, was a breakthrough in the area because it does not need to look at the future. It keeps track of the independence races witnessed along its execution and uses them to decide the required exploration dynamically, without the need of static approximation. DPOR is nowadays considered one of the most scalable techniques for

This work was funded partially by the Spanish MECD Salvador de Madariaga Mobility Grants PRX17/00297 and PRX17/00303, the Spanish MINECO projects TIN2015-69175-C4-2-R and TIN2015-69175-C4-3-R, and by the CM project S2013/ICE-3006.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 392–410, 2018. https://doi.org/10.1007/978-3-319-96142-2\_24

software verification. The key of DPOR algorithms is in the dynamic construction of two types of sets at each scheduling point: the *sleep set* that contains processes whose exploration has been proved to be redundant (and hence should not be selected), and the *backtrack set* that contains the processes that have not been proved independent with previously explored steps (and hence need to be explored). Source-DPOR (SDPOR) [1,2] improves the precision to compute backtrack sets (named *source* sets), proving optimality of the resulting algorithm for *any* number of processes w.r.t. an *unconditional independence* relation.

*Challenge.* When considering (S)DPOR with unconditional independence, if a pair of events is not independent in all possible executions, they are treated as potentially dependent and their interleavings explored. Unnecessary exploration can be avoided using conditional independence. E.g., two processes executing respectively the atomic instructions if(z ≥ 0) z = x; and x = x + 1; would be considered dependent even if z ≤ −1—this is indeed an *independence constraint* (IC) for these two instructions. Conditional independence was early introduced in the context of POR [11,15]. The first algorithm that has used notions of conditional independence within the state-of-the-art DPOR algorithm is Context-Sensitive DPOR (CSDPOR) [3]. However, CSDPOR does not use ICs (it rather checks state equivalence dynamically during the exploration) and exploits conditional (context-sensitive) independence only *partially* to extend the sleep sets. Our challenge is twofold: (i) extend the DPOR framework to exploit ICs during the exploration in order to both reduce the backtrack sets and expand the sleep sets as much as possible, (ii) statically synthesize ICs in an automatic pre-analysis.

*Contributions.* The main contributions of this work can be summarized as:


### **2 Background**

In this section we introduce some notations, the basic notions on the POR theory and the state-of-the-art DPOR algorithm that we will extend in Sect. 3.

Our work is formalized for a general model of concurrent systems, in which a program is composed of *atomic blocks* of code. An atomic block can contain just one (global) statement that affects the global state, a sequence of local statements (that only read and write the local state of the process) followed by a global statement, or a block of code with possibly several global statements but whose execution cannot interleave with other processes because it has been implemented as atomic (e.g., using locks, semaphores, etc.). Each atomic block in the program is given a unique block identifier. We use *spawn*(*P*[*ini*]) to create a new process. Depending on the programming language, P can be the name of a method and [ini] initial values for the parameters, or P can be the identifier of the initial block to execute and [ini] the initialization instructions, etc., in every case with mechanisms to continue the execution from one block to the following one. Notice that the use of atomic blocks in our formalization generalizes the particular case of considering atomicity at the level of single instructions.

As previous work on DPOR [1–3], we assume the state space does not contain cycles, executions have finite unbounded length and processes are deterministic (i.e., at a given time there is at most one event a process can execute). Let Σ be the set of states of the system. There is a unique initial state s<sup>0</sup> ∈ Σ. The execution of a process p is represented as a partial function execute<sup>p</sup> : Σ → Σ that moves the system from one state to a subsequent state. Each application of the function execute<sup>p</sup> represents the execution of an *atomic block* of the code that p is running, denoted as *event* (or execution step) of process p. An *execution sequence* E (also called *derivation*) of a system is a finite sequence of events of its processes starting from s0, and it is uniquely characterized by the sequence of processes that perform steps of E. For instance, p · q · q denotes the execution sequence that first performs one step in p, followed by two steps in q. We use to denote the empty sequence. The state of the system after E is denoted by s[E]. The set of processes enabled in state s (i.e., that can perform an execution step from s) is denoted by *enabled*(*s*).

### **2.1 Basics of Partial Order Reduction**

An *event* e of the form (p, i) denotes the i-th occurrence of process p in an execution sequence, and ˆe denotes the process p of event e, which is extended to sequences of events in the natural way. We write ¯e to refer to the identifier of the atomic block of code the event e is executing. The set of events in execution sequence E is denoted by dom(E). We use e <<sup>E</sup> e to denote that event e occurs before event e in E, s.t. <<sup>E</sup> establishes a total order between events in E, and E ≤ E to denote that sequence E is a prefix of sequence E . Let dom[E](w) denote the set of events in execution sequence E.w that are in sequence w, i.e., dom(E.w)\dom(E). If w is a single process p, we use next[E](p) to denote the single event in dom[E](p). If P is a set of processes, next[E](P) denotes the set of next[E](p) for all p ∈ P. The core concept in POR is that of the *happens-before* partial order among the events in execution sequence E, denoted by →E. This relation defines a subset of the <<sup>E</sup> total order, such that any two sequences with the same happens-before order are equivalent. Any linearization E of →<sup>E</sup> on dom(E) is an execution sequence with exactly the same happens-before relation →<sup>E</sup> as →E. Thus, →<sup>E</sup> induces a set of equivalent execution sequences, all with the same happens-before relation. We use E E to denote that E and E are linearizations of the same happens-before relation. The happens-before partial order has traditionally been defined in terms of a *dependency* relation between

### **Algorithm 1.** (Source+Context-sensitive)+Constrained DPOR algorithm

```
1: procedure explore(E)
2: if (∃p ∈ (enabled(s[E])\sleep(E))) then
3: back(E) := {p};
4: while (∃p ∈ (back(E)\sleep(E))) do
5: let n = next[E](p);
6: for all (e ∈ dom(E) such that e -
                                      E.p n) do
7: let E-
                = pre(E,e);
8: let u = dep(E, e, n);
9: if (¬(U⇒(Ie, ¯ n¯ , e, n, s[E-
                              .uˆ])) then
10: updateBack(E,E-

                            , e, p);
11: if C(s[E-
                    .uˆ]) for some C ∈ Ie, ¯ n¯ then
12: add ˆu.p.eˆ to sleep(E-

                                );
13: else
14: updateSleepCS(E,E-

                                , e, p);
15: sleep(E.p) := {x | x ∈ sleep(E), E |= p  x}
16: ∪ {x | p.x ∈ sleep(E)}
17: ∪ {x | x ∈ sleep(E), |x| = 1, m = next[E](x), U⇒(In, ¯ m¯ , n, m, s[E]
                                                                    ))};
18: explore(E.p);
19: sleep(E) := sleep(E) ∪ {p};
```
the execution steps associated to those events [10]. Intuitively, two steps p and q are *dependent* if there is at least one execution sequence E for which they do not commute, either because (i) one *enable*s the other (i.e., the execution of p leads to introducing q, or viceversa), or because (ii) s[E.p.q] = s[E.q.p]. We define dep(E, e, n) as the subsequence containing all events e in E that occur after e and happen-before n in E.p (i.e., e<Ee and e →E.pn). The unconditional dependency relation is used for defining the concept of a *race* between two events. Event e is said to be in race with event e in execution E, if the events belong to different processes, e happens-before e in E (e →<sup>E</sup> e ), and the two events are "concurrent", i.e. there exists an equivalent execution sequence E E where the two events are adjacent. We write e -<sup>E</sup> e to denote that e is in race with e and that the race can be reversed (i.e., the events can be executed in reverse order). POR algorithms use this relation to reduce the number of equivalent execution sequences explored, with SDPOR ensuring that only one execution sequence in each equivalence class is explored.

#### **2.2 State-of-the-Art DPOR with Unconditional Independence**

Algorithm 1 shows the state-of-the-art DPOR algorithm –based on the SDPOR algorithm of [1,2],<sup>1</sup> which in turn is based on the original DPOR algorithm of [9]. We refer to this algorithm as DPOR in what follows. The context-sensitive extension of CSDPOR [3] (lines 14 and 16) and our extension highlighted in blue

<sup>1</sup> The extension to support *wake-up trees* [2] is deliberately not included to simplify the presentation.

(lines 8–10, 11–13 and 17) should be ignored by now and will be described in Sect. 3.

The algorithm carries out a depth-first exploration of the execution tree using POR receiving as parameter a derivation E (initially empty). Essentially, it dynamically finds reversible races and is able to backtrack at the appropriate scheduling points to reverse them. For this purpose, it keeps two sets at every prefix E of E: back(E ) with the set of processes that must be explored from E , and, sleep(E ) with the set of sequences of processes that previous executions have determined do not need to be explored from E . Note that in the original DPOR the sleep set contained only single processes, but in later improvements sequences of processes are added, so our description considers this general case. The algorithm starts by selecting any process p that is enabled by the state reached after executing E and is not already in sleep(E). If it does not find any such process p, it stops. Otherwise, after setting back(E) = {p} to start the search, it explores every element in back(E) that is not in sleep(E). The backtrack set of E might grow as the loop progresses (due to later executions of line 10). For each such p, DPOR performs two phases: race detection (lines 6, 7 and 10) and state exploration (lines 15, 18 and 19). The race detection starts by finding all events e in dom(E) such that e -E.p n, where n is the event being selected (see line 5). For each such e, it sets E to pre(E,e), i.e., to be the prefix of E up to, but not including e. Procedure updateBack modifies back(E ) in order to ensure that the race between e and n is reversed. The source-set extension of [1,2] detects cases where there is no need to modify back(E ) –this is done within procedure updateBack whose code is not shown because it is not affected by our extension. After this, the algorithm continues with the state exploration phase for E.p, by retaining in its sleep set any element x in sleep(E) whose events in E.p are independent of the next event of p in E (denoted as E |= p x), i.e., any x such that next[E](p) would not happen-before any event in dom(E.p.x)\dom(E.p). Then, the algorithm explores E.p, and finally it adds p to sleep(E) to ensure that, when backtracking on E, p is not selected until a dependent event with it is selected. All versions of the DPOR algorithm (except [3]) rely on the unconditional (or context-insensitive) dependency relation. This relation has to be over-approximated, usually by requiring that global variables accessed by one execution step are not modified by the other.

*Example 1.* Consider the example in Fig. 1 with 3 processes p, q, r containing a single atomic block. Since all processes have a single event, by abuse of notation, we refer to events by their process name throughout all examples in the paper. Relying on the usual over-approximation of dependency all three pairs of events are dependent. Therefore, starting with one instance per process, the algorithm has to explore 6 execution sequences, each with a different happens-before relation. The tree, including the dotted and dashed fragments, shows the exploration from the initial state z = −2, x = −2. The value of variable z is shown in brackets at each state. Essentially, in all states of the form E.e, the algorithm always finds a reversible race between the next event of the current selected process (p, q or r) and e, and adds it to back(E). Also, when backtracking on E, none of the elements in sleep(E) is propagated down, since all events are considered dependent. In the best case, considering an exact (yet unconditional) dependency relation which realizes that events p and r are independent, the algorithm will make the following reductions. In state 6, p and r will not be in race and hence p will not be added to back(q). This avoids exploring the sequence p.r from 5. When backtracking on state 0 with r, where sleep() = {p, q}, p will be propagated down to sleep(r) since |= r p, hence avoiding the exploration of p.q from 8. Thus, the algorithm will explore 4 sequences.

**Fig. 1.** Left: code of working example (up) and ICs (down). Right: execution tree starting from *z* = −2*, x* = −2. Full tree computed by SDPOR, dotted fragment not computed by CSDPOR, and, dashed+dotted fragment not computed by CDPOR.

# **3 DPOR with Conditional Independence**

Our aim in CDPOR is twofold: (1) provide techniques to both infer and soundly check conditional independence, and (2) be able to exploit them at *all* points of the DPOR algorithm where dependencies are used. Section 3.1 reviews the notions of conditional independence and ICs, and introduces a first type of check where ICs can be directly used in the DPOR algorithm. Section 3.2 illustrates why ICs cannot be used at the remaining independence check points in the algorithm, and introduces sufficient conditions to soundly exploit them at those points. Finally, Sect. 3.3 presents the CDPOR algorithm that includes all types of checks.

### **3.1 Using Precomputed ICs Directly Within DPOR**

Conditional independence consists in checking independence at the given state.

**Definition 1 (conditional independence).** *Two events* α *and* β *are* independent *in state* S*, written* indep(α, β, S) *if (i1) none of them* enables *the other from* <sup>S</sup>*; and, (i2) if they are both enabled in* <sup>S</sup>*, then* <sup>S</sup> <sup>α</sup>·<sup>β</sup> −→ <sup>S</sup> *and* <sup>S</sup> <sup>β</sup>·<sup>α</sup> −→ <sup>S</sup> *.*

The use of conditional independence in the POR theory was firstly studied in [15], and it has been partially applied within the DPOR algorithm in CSDPOR [3]. Function *updateSleepCS* at line 14 and the modification of *sleep* at 16 encapsulate this partial application of CSDPOR (the code of *updateSleepCS* is not shown because it is not affected by our extension). Intuitively, *updateSleepCS* works as follows: when a reversible race is found in the current sequence being explored, it builds an *alternative* sequence which corresponds to the reverse race, and then checks whether the states reached after running the two sequences are the same. If they are, it adds the alternative sequence to the corresponding *sleep* set so that this sequence is not fully explored when backtracking. Therefore, sleep sets can contain *sequences* of events which can be propagated down via the rule of line 16 (i.e., if the event being explored is the head of a sequence in the sleep set, then the tail of the sequence is propagated down). In essence, the technique to check (i2) in Definition 1 in CSDPOR consists in checking state equivalence with an alternative sequence in the current state (hence it is conditional) and, if the check succeeds, it is exploited in the *sleep* set only (and not in the *backtrack* set).

*Example 2.* Let us explain the intuition behind the reductions that CSDPOR is able to achieve w.r.t. unconditional independence-based DPOR on the example. In state 1, when the algorithm selects q and detects the reversible race between q and p, it computes the alternative sequence q.p and realizes that s[p.q] = s[q.p], and hence adds p.q to sleep(). Similarly, in state 2, it computes p.r.q and realizes that s[p.q.r] = s[p.r.q] adding r.q to sleep(p). Besides these two alternative sequences, it computes two more. Overall, CSDPOR explores 2 complete sequences (p.q.r and q.r.p) and 13 states (the 9 states shown, plus 4 additional states to compute the alternative sequences).

Instead of computing state equivalence to check (i2) as in [3], our approach assumes precomputed *independence constraints* (ICs) for all pairs of atomic blocks in the program. ICs will be evaluated at the appropriate state to determine the independence between pairs of concurrent events executing such atomic blocks.

**Definition 2 (ICs).** *Consider two events* α *and* β *that execute, respectively, the atomic blocks* <sup>α</sup>¯ *and* <sup>β</sup>¯*. The* independence constraints <sup>I</sup>α, ¯ <sup>β</sup>¯ *are a set of boolean expressions (constraints) on the variables accessed by* α *and* β *(including local and global variables) s.t., if some constraint* C *in* Iα, ¯ <sup>β</sup>¯ *holds in state* S*, written* C(S)*, then condition (i2) of* indep(α, β, S) *holds.*

Our first contribution is in lines 11–13 where ICs are used within DPOR as follows. Before executing *updateSleepCS* at line 14, we check if some constraint in Ie, ¯ <sup>n</sup>¯ holds in the state s[E-.uˆ], by building the sequence E .uˆ, where u = dep(E, e, n). Only if our check fails we proceed to execute *updateSleepCS*. The advantages of our check w.r.t. *updateSleepCS* are: (1) the alternative execution sequence built by *updateSleepCS* is strictly longer than ours and hence more states will be explored, and (2) *updateSleepCS* must check state equivalence while we evaluate boolean expressions. Yet, because our IC is an approximation, if we fail to prove independence we can still use *updateSleepCS*.

*Example 3.* Consider the ICs in Fig. 1 (down left), which provide the constraints ensuring the independence of each pair of atomic blocks, and whose synthesis is explained in Sect. 4.1. In the exploration of the example, when the algorithm detects the reversible race between q and p in state 1, instead of computing q.p and then comparing s[p.q] = s[q.p] as in CSDPOR, we would just check the constraint in Ip, ¯ <sup>q</sup>¯ at state , i.e., in z = −2 (line 11), and since it succeeds, q.p is added to sleep(). The same happens at states 2, again at 1 (when backtracking with r), and 5. This way we avoid the exploration of the additional 4 states due to the computation of the alternative sequences in Example 2 (namely q.p, r.p and r.q from state 0, and r.q from 1). The algorithm is however still exploring many redundant derivations, namely states 4, 5, 6, 7 and 8.

### **3.2 Transitive Uniformity: How to Further Exploit ICs Within DPOR**

The challenge now is to use ICs, and therefore conditional independence, at the remaining dependency checks performed by the DPOR algorithm, and most importantly, for the race detection (line 6). In the example, that would avoid the addition of q and r to back() and r to back(p), and hence would make the algorithm only explore the sequence p.q.r. Although that can be done in our example, it is unsound in general as the following counter-example illustrates.

*Example 4.* Consider the same example but starting from the initial state z = −1, x = −2. During the exploration of the first sequence p.q.r, the algorithm will not find any race since p and q are independent in z = −1, q and r are independent in z = x = −1, and, p and r are always independent. Therefore, no more sequences than p.q.r with final result z = 0 will be explored. There is however a non-equivalent sequence, r.q.p, which leads to a different final state z = −1.

The problem of using conditional independence within the POR theory was already identified by Katz and Peled [15]. Essentially, the main idea of POR is that the different linearizations of a partial order yield equivalent executions that can be obtained by swapping adjacent independent events. However, this is no longer true with conditional dependency. In Example 4, using conditional independence, the partial order of the explored derivation p.q.r would be empty, which means there would be 6 possible linearizations. However r.q.p is not equivalent to p.q.r since q and p are dependent in s[r], i.e., when z = 0. An extra condition, called *uniformity*, is proposed in [15] to allow using conditional independence within the POR theory. Intuitively, *uniform independence* adds a condition to Definition 1 to ensure that independence holds at all successor states for those events that are enabled and are *uniformly independent* with the two events whose independence is being proved. While this notion can be checked *a posteriori* in a given exploration, it is unclear how it could be applied in a dynamic setting where decisions are made *a priori*. Here we propose a weaker notion of uniformity, called *transitive uniformity*, for which we have been able to prove that the *dynamic*-POR framework is sound. The difference with [15] is that our extra condition ensures that independence holds at all successor states for *all* events that are enabled, which is thus a superset of the events considered in [15]. We notice that the general happens-before definition of [1,2] does not capture our transitive uniform conditional independence below (namely property seven of [1,2] does not hold), hence CDPOR cannot be seen as an instance of SDPOR but rather as an extension.

**Definition 3.** *The* transitive uniform *conditional independence relation, written unif*(α, β, S)*, fulfills (i1) and (i2) and, (i3) unif*(α, β, Sγ) *holds for all* γ /∈ {α, β} *enabled in* <sup>S</sup>*, where* <sup>S</sup><sup>γ</sup> *is defined by* <sup>S</sup> <sup>γ</sup> −→ Sγ*.*

During the exploration of the sequence p.q.r in Example 4, the algorithm will now find a reversible race between p and q, since the independence is not transitively uniform in z = −1, x = −2. Namely, *(i3)* does not hold since r is enabled and we have x = −1 and z = 0 in s[r], which implies ¬*unif*(p, q, s[r]) (*(i2)* does not hold).

We now introduce sufficient conditions for transitive uniformity that can be precomputed statically, and efficiently checked, in our dynamic algorithm. Condition *(i1)* is computed dynamically as usual during the exploration simply storing enabling dependencies. Condition *(i2)* is provided by the ICs. Our sufficient conditions to ensure *(i3)* are as follows. For each atomic block b, we precompute *statically* (before executing DPOR) the set W(b) of the global variables that can be modified by the full execution of b, i.e., by an instruction in b or by any other block called from, or enabled by, b (transitively). To this end, we do a simple analysis which consists in: (1) First we build the call graph for the program to establish the calling relationships between the blocks in the program. Note that when we find a process creation instruction spawn(P[ini]) we have a calling relationship between the block in which the spawn instruction appears and P. (2) We obtain (by a fixed point computation) the largest relation fulfilling that g belongs to W(b) if either g is *modified* by an instruction in b or g belongs to W(c) for some block c called from b. This computation can be done with different levels of precision, and it is well-studied in the static analysis field [18]. We let *G*(C) be the set of global variables evaluated on constraint C in I.

**Definition 4 (sufficient condition for transitive uniformity,** U⇒**).** *Let* E *be a sequence,* I *a set of constraints,* α *and* β *be two events enabled in* s[E]*, and* T = next[E](enabled(s[E])) \ {α, β}*, we define* U⇒(I, α, β, s[E]) ≡ ∃C ∈ I : C(s[E]) ∧ ((*G*(C) ∩ - <sup>t</sup>∈<sup>T</sup> <sup>W</sup>(<sup>t</sup> ¯)) = <sup>∅</sup>)*.*

Intuitively, our sufficient condition ensures transitive uniformity by checking that the global variables involved in the constraint C of the IC used to ensure the uniformity condition are not modified by other enabled events in the state.

**Theorem 1.** *Given a sequence* E *and two events* α *and* β *enabled in* s[E]*, we have that* U⇒(Iα, ¯ <sup>β</sup>¯, α, β, s[E]) ⇒ *unif*(α, β, s[E])*.*

### **3.3 The Constrained DPOR Algorithm**

The code highlighted in blue in Algorithm 1 provides the extension to apply conditional independence within DPOR. In addition to the pruning explained in Sect. 3.1, it achieves two further types of pruning:


It is important to note also that the inferred conditional independencies are recorded in the happens-before relation to be later re-used for subsequent computations of the and dep definitions.

*Example 5.* Let us describe the exploration for the example in Fig. 1 using our CDPOR. At state 1, the algorithm checks whether p and q are in race. U⇒(Ip, ¯ <sup>q</sup>¯, p, q, S) does not hold in z = −2 since, although (z ≤ −1) ∈ Ip, ¯ <sup>q</sup>¯ holds, we have that *G*(z ≤ −1)∩W(r) = {z} = ∅. Process q is hence added to back(). On the other hand, since (z ≤ −1) ∈ Ip, ¯ <sup>q</sup>¯ holds in z = −2 (line 11), q.p is added to sleep() (line 12). At state 2 the algorithm checks the possible race between q and r after executing p. This time the transitive uniformity of the independence of q and r holds since (z ≤ −2) ∈ Iq,¯ <sup>r</sup>¯ holds, and there are no enabled events out of {q, r}. Our algorithm therefore avoids the addition of r to back(p) (pruning 1 above). The algorithm also checks the possible race between p and r in z = −2. Again, true ∈ Ip, ¯ <sup>r</sup>¯ holds and is uniform since *G*(true) = ∅ (pruning 1). The algorithm finishes the exploration of sequence p.q.r and then backtracks with q at state 0. At state 5 the algorithm selects process r (p is in the sleep set of 5 since it is propagated down from the q.p in sleep()). It then checks the possible race between q and r, which is again discarded (pruning 1), since transitive uniformity of the independence of q and r can be proved: we have that (z ≤ −2) ∈ Iq,¯ <sup>r</sup>¯ holds in z = −2 and W(p) ∩ G(z ≤ −2) = ∅, where p is the only enabled event out of {q, r} and W(p) = {x}. This avoids adding r to back(). Finally, at state 5, p is propagated down in the new sleep set (pruning 2), since as before true ∈ Ip, ¯ <sup>r</sup>¯ ensures transitive uniformity. The exploration therefore finishes at state 6.

Overall, on our working example, CDPOR has been able to explore only one complete sequence p.q.r and the partial sequence q.r (a total of 6 states). The latter one could be avoided if a more precise sufficient condition for uniformity is provided which, in particular, is able to detect that the independence of p and q in is transitive uniform, i.e., it still holds after r (even if r writes variable z). **Theorem 2 (soundness).** *For each Mazurkiewicz trace* T *defined by the happens before relation, Explore(*, ∅*) in Algorithm 1 explores a complete execution sequence* T *that reaches the same final state as* T*.*

# **4 Automatic Generation of ICs Using SMT**

Generating ICs amounts to proving (conditional) program equivalence w.r.t. the global memory. While the problem is very hard in general, proving equivalence of smaller blocks of code becomes more tractable. This section introduces a novel SMT-based approach to synthesize ICs between pairs of atomic blocks of code. Our ICs can be used within any transformation or analysis tool –beyond DPOR– which can gain accuracy or efficiency by knowing that fragments of code (conditionally) commute. Section 4.1 first describes the inference for basic blocks; Sect. 4.2 extends it to handle process creation and Sect. 4.3 outlines other extensions, like loops, method invocations and data structures.

### **4.1 The Basic Inference**

In this section we consider blocks of code containing conditional statements and assignments using linear integer arithmetic (LIA) expressions. The first step to carry out the inference is to transform q and r into two respective *deterministic* Transition Systems (TSs), T<sup>q</sup> and T<sup>r</sup> (note that q and r are assumed to be deterministic), and compose them in both reverse orders T<sup>q</sup>·<sup>r</sup> and T<sup>r</sup>·<sup>q</sup>. Consider r and q in Fig. 1 whose associated TSs are (primed variables represent the final value of the variables):

$$\begin{array}{ll} T\_q: z \ge 0 \to z' = x; & T\_r: true \to x' = x+1, z' = z+1; \\ z < 0 \to z' = z; \end{array}$$

The code to be analyzed is the composition of T<sup>q</sup> and T<sup>r</sup> in both orders:

$$\begin{array}{llll} T\_{q\cdot r} \colon & z \ge 0 \to x' = x + 1, z' = x + 1; & T\_{r\cdot q} \colon & z \ge -1 \to x' = x + 1, z' = x + 1; \\ & z < 0 \to x' = x + 1, z' = z + 1; & & z < -1 \to x' = x + 1, z' = z + 1; \end{array}$$

In what follows we denote by T<sup>a</sup>·<sup>b</sup> the deterministic TS obtained from the concatenation of the blocks a and b, such that all variables are assigned in one instruction using parallel assignment. We let A |<sup>G</sup> be the restriction to the global memory of the assignments in A (i.e., ignoring the effect on local variables). The following definition provides an SMT formula over LIA (a boolean formula where the atoms are equalities and inequalities over linear integer arithmetic expressions) which encodes the independence between the two blocks.

**Definition 5 (IC generation).** *Let us consider two atomic blocks* q *and* r *and a global memory* G *and let* C<sup>i</sup> → A<sup>i</sup> *(resp.* C <sup>j</sup> → A <sup>j</sup> *) be the transitions in* T<sup>q</sup>·<sup>r</sup> *(resp.* T<sup>r</sup>·<sup>q</sup>*). We obtain* Fq,r *as the SMT formula:* i,j (C<sup>i</sup> ∧ C <sup>j</sup> ∧ A<sup>i</sup> |G= A <sup>j</sup> |G)*.* Intuitively, the SMT encoding in the above definition has as solutions all those states where both a condition C<sup>i</sup> of a transition in Tq·<sup>r</sup> and C <sup>j</sup> of a transition in Tr·<sup>q</sup> hold (and hence are compatible) and the final global state after executing all instructions in the two transitions (denoted A<sup>i</sup> and A <sup>j</sup> ) remains the same.

Next, we generate the constraints of the independence condition Iq,r by obtaining a compact representation of all models over linear arithmetic atoms (computed by an *allSAT* SMT solver) satisfying Fq,r. In particular, we add a constraint in Iq,r for every obtained model.

*Example 6.* In the example, we have the TS with conditions and assignments:

$$\begin{array}{c|c} T\_{q \cdot r} \colon C\_1 \colon z \ge 0 \quad A\_1 \colon x' = x + 1, z' = x + 1\\ C\_2 \colon z < 0 \quad A\_2 \colon x' = x + 1, z' = z + 1 \end{array} \Big| \begin{array}{c} T\_{r \cdot q} \colon C\_1' \colon z \ge -1 \quad A\_1' \colon x' = x + 1, z' = x + 1\\ C\_2' \colon z < -1 \quad A\_2' \colon x' = x + 1, z' = z + 1 \end{array}$$

and we obtain a set with three constraints Iq,r = {(z ≥ 0),(z = x),(z < −1)} by computing all models satisfying the following resulting formula:

$$\begin{aligned} &(z \ge 0 \land z \ge -1 \land x + 1 = x + 1 \land x + 1 = x + 1) \lor \\ &(z \ge 0 \land z < -1 \land x + 1 = x + 1 \land x + 1 = z + 1) \lor \\ &(z < 0 \land z \ge -1 \land x + 1 = x + 1 \land z + 1 = x + 1) \lor \\ &(z < 0 \land z < -1 \land x + 1 = x + 1 \land z + 1 = z + 1) \end{aligned}$$

The second conjunction is unsatisfiable since there is no model with both C<sup>1</sup> and C <sup>2</sup>. On the other hand, the equalities of the first and the last conjunctions always hold, which give us the constraints z ≥ 0 and z ≤ −2. Finally, all equalities hold when x = z, which give us the third constraint as a result for our SMT encoding.

Note that, as in this case Fq,r describes not only a sufficient but also a necessary condition for independence, the obtained constraints IC are also a sufficient and necessary conditions for independence. This allows removing line 14 in the algorithm, since the context-sensitive check will fail if line 11 does. However, the next extensions do not ensure that the generated ICs are necessary conditions.

### **4.2 IC for Blocks with Process Creation**

Consider the following two methods whose body constitutes an atomic block (e.g., the lock is taken at the method start and released at the return). They are inspired by a highly concurrent computation for the Fibonacci used in the experiments. Variables nr and r are global to all processes:


We now want to infer Ifib(v),fib(v1), Ifib(v),res(v1), Ires(v),res(v1). The first step is to obtain, for each block r, a *TS with uninterpreted functions*, denoted T S<sup>u</sup> <sup>r</sup> , in which transitions are of the form C → (A, S) where A are the parallel assignments as in Sect. 4.1, and S is a multiset containing calls to fresh *uninterpreted* functions associated to the processes spawned within the transition (i.e., a process creation spawn(P) is associated to an uninterpreted function spawn P).

$$\begin{array}{l} T\_{\mathsf{lib}}^{u} \colon v \le 1 \to (skip, \{spawn.res(v)\})\\ v > 1 \to (skip, \{spawn.fib(v-1), spawn.fib(v-2)\})\\ T\_{\mathsf{res}}^{u} \colon nr \ge 0 \to (nr'=0, r'=v, \{\})\\ nr < 0 \to (nr'=1, r'=0, \{spawn.res(r+v)\}) \end{array}$$

The following definition extends Definition 5 to handle process creation. Intuitively, it associates a fresh variable to each different element in the multisets (mapping P below) and enforces equality among the multisets.

**Definition 6 (IC generation with process creation).** *Let us consider* T S<sup>u</sup> r·q *and* T S<sup>u</sup> <sup>q</sup>·<sup>r</sup>*. We define* <sup>P</sup> <sup>=</sup> {∪<sup>s</sup> <sup>|</sup> <sup>s</sup> <sup>∈</sup> S, with C <sup>→</sup> (A, S) <sup>∈</sup> T S<sup>u</sup> <sup>r</sup>·<sup>q</sup> <sup>∪</sup> T S<sup>u</sup> <sup>q</sup>·<sup>r</sup>}*. Let* P *be a mapping from the elements in* P *to fresh variables, and* P (S) *be the replacement of the elements in the multiset* S *applying the mapping* P *. Let* C<sup>i</sup> → (Ai, Si) *(resp.* C <sup>j</sup> → (A <sup>j</sup> , S <sup>j</sup> )*) be the transitions in* T S<sup>u</sup> <sup>q</sup>·<sup>r</sup> *(resp.* T S<sup>u</sup> <sup>r</sup>·<sup>q</sup>*). We obtain* Fq,r *as the SMT formula:* i,j (Ci∧C <sup>j</sup> ∧A<sup>i</sup> |G= A <sup>j</sup> |<sup>G</sup> ∧P (Si) ≡ P (S <sup>j</sup> ))*.*

For simplicity and efficiency, we consider that ≡ corresponds to the syntactic equality of the multisets. However, in order to improve the precision of the encoding we apply P to S<sup>i</sup> and S<sup>j</sup> replacing two process creations by the same variable if they are equal modulo associativity and commutativity (AC) of arithmetic operators and after substituting the equalities already imposed by A<sup>i</sup> |G= A j (see example below). A more precise treatment can be achieved by using equality with uninterpreted functions (EUF) to compare the multisets of processes.

*Example 7.* Let us show how we apply the above definition to infer Ires(v),res(v1). We first build Tres(v)·res(v1) from Tres(v) by composing it with itself:

$$\begin{aligned} nr &\le 0 \to \left(nr'=0, r'=v\_1, \{spawn.res(r+v)\}\right) \\ nr &> 0 \to \left(nr'=1, r'=0, \{spawn.res(v+v\_1)\}\right) \end{aligned}$$

and <sup>T</sup>res(v1)·res(v) which is like the one above but exchanging <sup>v</sup> and <sup>v</sup><sup>1</sup>. Next, we define P = {spawn res(r + v) → x1, spawn res(v + v1) → x2, spawn res(r + v1) → x3, spawn res(v1+v) → x4} and apply it with the improvement described above

$$\begin{aligned} \left(nr &\le 0 \land nr \le 0 \land 0 = 0 \land v = v\_1 \land \{x\_1\} = \{x\_1\}\right) \lor \\ \left(nr &\le 0 \land nr > 0 \land 0 = 1 \land v\_1 = 0 \land \{x\_1\} = \{x\_4\}\right) \lor \\ \left(nr &> 0 \land nr \le 0 \land 1 = 0 \land 0 = v \land \{x\_2\} = \{x\_3\}\right) \lor \\ \left(nr &> 0 \land nr > 0 \land 1 = 1 \land 0 = 0 \land \{x\_2\} = \{x\_2\}\right) \end{aligned}$$

Note that the second and the third conjunction are unfeasible and hence can be removed from the formula. In the first one spawn res(r + v1) is replaced by x<sup>1</sup> (instead of x3) since we can substitute v<sup>1</sup> by v as v = v<sup>1</sup> is imposed in the conjunction and in the fourth one spawn res(v<sup>1</sup> + v) is replaced by x<sup>2</sup> (instead of x4) since it is equal modulo AC to spawn res(v + v1). Then we finally have

$$(nr \le 0 \land nr \le 0 \land 0 = 0 \land v = v\_1) \quad \lor \quad (nr > 0 \land nr > 0 \land 1 = 1 \land 0 = 0)$$

As before, Ires(v),res(v1) = {(nr > 0),(v = v1)} is then obtained by computing all satisfying models. In the same way we obtain Ifib(v),res(v1) = Ifib(v),fib(v1) = {true}. The following theorem states the soundness of the inference of ICs, that holds by construction of the SMT formula.

**Theorem 3 (soundness of independence conditions).** *Given the assumptions in Definition 6, if* <sup>∃</sup><sup>C</sup> <sup>∈</sup> <sup>I</sup>r,q *s.t.* <sup>C</sup>(S) *holds, then* <sup>S</sup> <sup>r</sup>·<sup>q</sup> −→ <sup>S</sup> *and* <sup>S</sup> <sup>q</sup>·<sup>r</sup> −→ <sup>S</sup> *.*

We will also get a necessary condition in those instances where the use of syntactic equality modulo AC on the multisets of created processes (as described above) is not loosing precision. This can be checked when building the encoding.

### **4.3 Other Extensions**

We abstract loops from the code of the blocks so that we can handle them as uninterpreted functions similarly to Definition 6. Basically, for each loop, we generate as many uninterpreted functions as variables it modifies (excluding local variables of the loop) plus one to express all processes created inside the loop. The functions have as arguments the variables accessed by the loop (again excluding local variables). This transformation allows us to represent that each variable might be affected by the execution of the loop over some parameters, and then check in the reverse trace whether we get to the loop over the same parameters.

**Definition 7 (loop extraction for IC generation).** *Let us consider a loop* L *that accesses* x1,...,x<sup>n</sup> *variables and modifies* y1,...,y<sup>m</sup> *variables (excluding local loop variables) and let* l1,...,lm+1 *be fresh function symbol names. We replace* L *by the following code:*

$$\begin{aligned} x'\_1 &= x\_1; \dots; \ x'\_n = x\_n; \ y\_1 = l\_1(x'\_1, \dots, x'\_n); \dots; \ y\_m = l\_m(x'\_1, \dots, x'\_n);\\ sspan(f\_{m+1}(x'\_1, \dots, x'\_n)); \quad (only \; if \; there \; are \; square \; operations \; inside \; the \; loop) \end{aligned}$$

Existing dependency analysis can be used to infer the subset of x1,...,x<sup>n</sup> that affects each yi, achieving more precision with a small pre-computation overhead.

The treatment of method invocations (or function calls) to be executed atomically within the considered blocks can be done analogously to loops by introducing one fresh function for every (non-local) variable that is modified within the method call and one more for the result. The parameters of these new functions are the original ones plus one for each accessed (non-local) variable. After the transformations for both loops and calls described above, we have TSs with function calls that are treated as uninterpreted functions in a similar way to Definition 6. However these functions can now occur in the conditions and the assignments of the TS. To handle them, we use again a mapping P to remove all function calls from the TS and replace them by fresh integer variables. After that the encoding is like in Definition 6, and we obtain an SMT formula over LIA, which is again sent to the allSAT SMT solver. Once we have obtained the models we replace back the introduced fresh variables by the function calls using the mapping P. Several simplifications on equalities involving function calls can be done before and after invoking the solver to improve the result. As a final remark, data structures like lists or maps have been handled by expressing their uses as function calls, hence obtaining constraints that include conditions on them.

# **5 Experiments**

In this section we report on experimental results that compare the performance of three DPOR algorithms: SDPOR [1,2], CSDPOR [3] and our proposal CDPOR. We have implemented and experimentally evaluated our method within the SYCO tool [3], a systematic testing tool for message-passing concurrent programs. SYCO can be used online through its web interface available at http:// costa.fdi.ucm.es/syco. To generate the ICs, SYCO calls a new feature of the VeryMax program analyzer [6] which uses Barcelogic [5] as SMT solver. As benchmarks, we have borrowed the examples from [3] (available online from the previous url) that were used to compare SDPOR with CSDPOR. They are classical concurrent applications: several concurrent sorting algorithms (QS, MS, PS), concurrent Fibonacci Fib, distributed workers Pi, a concurrent registration system Reg and database DBP, and a consumer producer interaction BB. These benchmarks feature the typical concurrent programming methodology in which computations are split into smaller atomic subcomputations which concurrently interleave their executions, and which work on the same shared data. Therefore, the concurrent processes are highly interfering, and both inferring ICs and applying DPOR algorithms on them becomes challenging.

We have executed each benchmark with size increasing input parameters. A timeout of 60 s is used and, when reached, we write >X to indicate that for the corresponding measure we encountered X units up to that point (i.e., it is at least X). Table 1 shows the results of the executions for 6 different inputs. Column T r shows the number of traces, S the number of states that the algorithms explore, and T the time in *sec* it takes to compute them. For CDPOR, we also show the time Tsmt of inferring the ICs (since the inference is performed once for all executions, it is only shown in the first row). Times are obtained on an Intel(R) Core(TM) i7 CPU at 2.5 GHz with 8 GB of RAM (Linux Kernel 5.4.0). Columns G**<sup>s</sup>** and G**cs** show the time speedup of CDPOR over SDPOR and CSDPOR, respectively, computed by dividing each respective T by the time T of CDPOR. Column G**smt** shows the time speedup over CSDPOR including Tsmt in the time of CDPOR. We can see from the speedups that the gains of CDPOR increase exponentially in all examples with the size of the input. When compared with CSDPOR, we achieve reductions up to 4 orders of magnitude for the largest inputs on which CSDPOR terminates (e.g., Pi, QS). It is important to highlight that the number of non-unitary sequences stored in sleep sets is 0 in every benchmark except in BB for which it remains quite low (namely for BB(11) the peak is 22).

W.r.t. SDPOR, we achieve reductions of 4 orders of magnitude even for smaller inputs for which SDPOR terminates (e.g., PS). Note that since most examples reach the timeout, the gains are at least the ones we show, thus the


**Table 1.** Experimental evaluation

concrete numbers shown should not be taken into account. In some examples (e.g., BB, MS), though the gains are linear for the small inputs, when the size of the problem increases both SDPOR and CSDPOR time out, while CDPOR can still handle them efficiently.

Similar reductions are obtained for number of states explored. In this case, the system times out when it has memory problems, and the computation stops progressing (hence the number of explored states does not increase with the input any more). As regards the time to infer the annotations Tsmt, we observe that in most cases it is negligible compared to the exploration time of the other methods. QS is the only example that needs some seconds to be solved and this is due to the presence of several nested conditional statements combined with the use of built-in functions for lists, which makes the generated SMT encoding harder for the solver and the subsequent simplification step. Note that the inference is a pre-process which does not add complexity to the actual DPOR algorithm.

# **6 Related Work and Conclusions**

The notion of conditional independence in the context of POR was first introduced in [11,15]. Also [12] provides a similar strengthened dependency definition. CSDPOR was the first approach to exploit this notion within the state-of-the-art DPOR algorithm. We advance this line of research by fully integrating conditional independence within the DPOR framework by using *independence constraints* (ICs) together with the notion of *transitive uniform* conditional independence –which ensures the ICs hold along the whole execution sequence. Both ICs and transitive uniformity can be approximated statically and checked dynamically, making them effectively applicable within the dynamic framework. The work in [14,21] generated for the first time ICs for processes with a single instruction following some predefined patterns. This is a problem strictly simpler than our inference of ICs both in the type of IC generated (restricted to the patterns) and on the single-instruction blocks they consider. Furthermore, our approach using an AllSAT SMT solver is different from the CEGAR approach in [4]. The ICs are used in [14,21] for SMT-based bounded model checking, an approach to model checking fundamentally different from our stateless model checking setting. As a consequence ICs are used in a different way, in our case with no bounds on number of processes, nor derivation lengths, but requiring a uniformity condition on independence in order to ensure soundness. Maximal causality reduction [13] is technically quite different from CDPOR as it integrates SMT solving within the dynamic algorithm.

Finally, data-centric DPOR (DCDPOR) [7] presents a new DPOR algorithm based on a different notion of dependency according to which the equivalence classes of derivations are based on the pairs read-write of variables. Consider the following three simple processes {p, q, r} and the initial state x = 0:

p: write(x = 5), q: write(x = 5), r: read(x). In DCDPOR, we have only three different observation functions: (r, x) (reading the initial value), (r, p) (reading the value that p writes), (r, q) (reading the value that q writes). Therefore, this notion of relational independence is finer grained than the traditional one in DPOR. However, DCDPOR does not consider conditional dependency, i.e., it does not realize that (r, p) and (r, q) are equivalent, and hence only two explorations are required (and explored by CDPOR). The example in conclusion, our approach and DCDPOR can complement each other: our approach would benefit from using a dependency based on the read-write pairs as proposed in DCDPOR, and DCDPOR would benefit from using conditional independence as proposed in our work. It remains as future work to study this integration. Related to DCDPOR, [16] extends optimal DPOR with observers. For the previous example, [16] needs to explore five executions: r.p.q and r.q.p, are equivalent because p and q do not have any observer. Another improvement orthogonal to ours is to inspect dependencies over chains of events, as in [17,19].

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

CPS, Hardware, Industrial Applications

# **Formal Verification of a Vehicle-to-Vehicle (V2V) Messaging System**

Mark Tullsen1(B), Lee Pike<sup>2</sup>, Nathan Collins<sup>1</sup>, and Aaron Tomb<sup>1</sup>

<sup>1</sup> Galois, Inc., Portland, OR, USA *{*tullsen,conathan,atomb*}*@galois.com <sup>2</sup> Groq, Inc., Palo Alto, USA leepike@gmail.com

**Abstract.** Vehicle-to-Vehicle (V2V) communications is a "connected vehicles" standard that will likely be mandated in the U.S. within the coming decade. V2V, in which automobiles broadcast to one another, promises improved safety by providing collision warnings, but it also poses a security risk. At the heart of V2V is the communication messaging system, specified in SAE J2735 using the Abstract Syntax Notation One (ASN.1) data-description language. Motivated by numerous previous ASN.1 related vulnerabilities, we present the formal verification of an ASN.1 encode/decode pair. We describe how we generate the implementation in C using our ASN.1 compiler. We define *self-consistency* for encode/decode pairs that approximates functional correctness without requiring a formal specification of ASN.1. We then verify self-consistency and memory safety using symbolic simulation via the *Software Analysis Workbench*.

**Keywords:** Automated verification · ASN.1 · Vehicle-to-Vehicle LLVM · Symbolic execution · SMT solver

# **1 Introduction**

At one time, automobiles were mostly mechanical systems. Today, a modern automobile is a complex distributed computing system. A luxury car might contain tens of millions of lines of code executing on 50–70 microcontrollers, also known as *electronic control units* (ECUs). A midrange vehicle might contain at least 25 ECUs, and that number continues to grow. In addition, various radios such as Bluetooth, Wifi, and cellular provide remote interfaces to an automobile.

With all that code and remotely-accessible interfaces, it is no surprise that software vulnerabilities can be exploited to gain unauthorized access to a vehicle. Indeed, in a study by Checkoway *et al.* on a typical midrange vehicle, for every remote interface, they found some software vulnerability that provided an attacker access to the vehicle's internal systems [4]. Furthermore, in each case,

This work was performed while Dr. Pike was at Galois, Inc.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 413–429, 2018. https://doi.org/10.1007/978-3-319-96142-2\_25

once the interface is exploited, the attackers could parlay the exploit to make arbitrary modifications to other ECUs in the vehicle. Such modifications could include disabling lane assist, locking/unlocking doors, and disabling the brakes. Regardless of the interface exploited, full control can be gained.

Meanwhile, the U.S. Government is proposing a new automotive standard for vehicle-to-vehicle (V2V) communications. The idea is for automobiles to have dedicated short-range radios that broadcast a *Basic Safety Message* (BSM) e.g., vehicle velocity, trajectory, brake status, etc.—to other nearby vehicles (within approximately 300 m). V2V is a crash prevention technology that can be used to warn drivers of unsafe situations—such as a stopped vehicle in the roadway. Other potential warning scenarios include left-turn warnings when lineof-sight is blocked, blind spot/lane change warnings, and do-not-pass warnings. In addition to warning drivers, such messages could have even more impact for autonomous or vehicle-assisted driving. The U.S. Government estimates that if applied to the full national fleet, approximately one-half million crashes and 1,000 deaths could be prevented annually [15]. We provide a more detailed overview of V2V in Sect. 2.

While V2V communications promise to make vehicles safer, they also provide an additional security threat vector by introducing an additional radio and more software on the vehicle.

This paper presents initial steps in ensuring that V2V communications are implemented securely. We mean "secure" in the sense of having no flaws that could be a vulnerability; confidentiality and authentication are provided in other software layers and are not in scope here. Specifically, we focus on the security of encoding and decoding the BSM. The BSM is defined using ASN.1, a data description language in widespread use. It is not an exaggeration to say that ASN.1 is the backbone of digital communications; ASN.1 is used to specify everything from the X.400 email protocol to voice over IP (VoIP) to cellular telephony. While ASN.1 is pervasive, it is a complex language that has been amended substantially over the past few decades. Over 100 security vulnerabilities have been reported for ASN.1 implementations in MITRE's Common Vulnerability Enumeration (CVE) [14]. We introduce ASN.1 and its security vulnerabilities in Sect. 3.

This paper presents the first work in formally verifying a subsystem of V2V. Moreover, despite the pervasiveness and security-critical nature of ASN.1, it is the first work we are aware of in which any ASN.1 encoder (that translate ASN.1 messages into a byte stream) and decoder (that recovers an ASN.1 message from a byte stream) has been formally verified. The only previous work in this direction is by Barlas *et al.*, who developed a translator from ASN.1 into CafeOBJ, an algebraic specification and verification system [1]. Their motivation was to allow reasoning about broader network properties, of which an ASN.1 specification may be one part, their work does not address ASN.1 encoding or decoding and appears to be preliminary.

The encode/decode pair is first generated by Galois' ASN.1 compiler, part of the *High-Assurance ASN.1 Workbench* (HAAW). The resulting encode/decode pair is verified using Galois' open source *Software Analysis Workbench* (SAW), a state-of-the-art symbolic analysis engine [6]. Both tools are further described in Sect. 4.

In Sect. 5 we state the properties verified: we introduce the notion of selfconsistency for encode/decode verification, which approximates functional correctness without requiring a formal specification of ASN.1 itself. Then we describe our approach to verifying the self consistency and memory safety of the C implementation of the encode/decode pair in Sect. 6 using compositional symbolic simulation as implemented in SAW. In Sect. 7 we put our results into context.

# **2 Vehicle-to-Vehicle Communications**

As noted in the introduction, V2V is a short-range broadcast technology with the purpose of making driving safer by providing early warnings. In the V2V system, the BSM is the key message broadcasted, up to a frequency of 10 Hz (it can be perhaps lower due to congestion control). The BSM must be compatible between all vehicles, so it is standardized under SAE J2735 [7].

The BSM is divided into Part I and Part II, and both are defined with ASN.1. Part I is called the *BSM Core Data* and is part of every message broadcast. Part I includes positional data (latitude, longitude, and elevation), speed, heading, and acceleration. Additionally it includes various vehicle state information including transmission status (e.g., neutral, park, forward, reverse), the steering wheel angle, braking system status (e.g., Are the brakes applied? Are anti-lock brakes available/engaged?, etc.), and vehicle size. Our verification, described in Sect. 6, is over Part I.

Part II is optional and extensible. Part II could include, for example, regionally-relevant data. It can also include additional vehicle safety data, including, for example, which of the vehicle's exterior lights are on. It may include information about whether a vehicle is a special vehicle or performing a critical mission, such as a police car in an active pursuit or an ambulance with a critical patient. It can include weather data, and obstacle detection.

### **3 ASN.1**

*Abstract Syntax Notation One* (ASN.1) is a standardized data description language in widespread usage. Our focus in this section is to give a sense of what ASN.1 is as well as its complexity. We particularly focus on aspects that have led to security vulnerabilities.

### **3.1 The ASN.1 Data Description Language and Encoding Schemes**

ASN.1 was first standardized in 1984, with many revisions since. ASN.1 is a data description language for specifying messages; although it can express relations between request and response messages, it was not designed to specify stateful protocols. While ASN.1 is "just" a data description language, it is quite large and complex. Indeed, merely parsing ASN.1 specifications is difficult. Dubuisson notes that the grammar of ASN.1 (1997 standard) results in nearly 400 shift/reduce errors and over 1,300 reduce/reduce errors in a LALR(1) parser generator, while a LL(k) parser generator results in over 200 production rules beginning with the same lexical token [8]. There is a by-hand transformation of the grammar into an LL(1)-compliant grammar, albeit no formal proof of their equivalence [9].

Not only is the syntax of ASN.1 complex, but so is its semantics. ASN.1 contains a rich datatype language. There are at least 26 base types, including arbitrary integers, arbitrary-precision reals, and 13 kinds of string types. Compound datatypes include sum types (e.g., CHOICE and SET), records with subtyping (e.g., SEQUENCE), and recursive types. There is a complex constraint system (ranges, unions, intersections, etc.) on the types. Subsequent ASN.1 revisions support open types (providing a sort of dynamic typing), versioning to support forward/backward compatibility, user-defined constraints, parameterized specifications, and so-called *information objects* which provide an expressive way to describe relations between types.

So far, we have only highlighted the data description language itself. A set of *encoding rules* specify how the ASN.1 messages are serialized for transmission on the wire. Encoder and decoder pairs are always with respect to a specific schema and encoding rule. There are at least nine standardized ASN.1 encoding rules. Most rules describe 8-bit byte (octet) encodings, but three rule sets are dedicated to XML encoding. Common encoding rules include the Basic Encoding Rules (BER), Distinguished Encoding Rules (DER), and Packed Encoding Rules (PER). The encoding rules do not specify the transport layer protocol to use (or any lower-level protocols, such as the link or physical layer).

### **3.2 Example ASN.1 Specification**

To get a concrete flavor of ASN.1, we present an example data *schema*. Let us assume we are defining messages that are sent (TX) and received (RX) in a query-response protocol.

```
MsgTx ::= SEQUENCE {
  txID INTEGER(1..5),
  txTag UTF8STRING
}
MsgRx ::= SEQUENCE {
  rxID INTEGER(1..7),
  rxTag SEQUENCE(SIZE(0..10)) OF INTEGER
}
```
We have defined two top-level types, each a SEQUENCE type. A SEQUENCE is an named tuple of fields (like a C struct). The MsgTx sequence contains two fields: txID and txTag. These are typed with built-in ASN.1 types. In the definition of MsgRx, the second field, rxTag, is the SEQUENCE OF structured type; it is equivalent to an array of integers that can have a length between 0 and 10, inclusively. Note that the txID and rxID fields are *constrained* integers that fall into the given ranges.

ASN.1 allows us to write values of defined types. The following is a value of type MsgTx:

```
msgTx MsgTx ::= {
  txID 1,
  txTag "Some msg"
}
```
#### **3.3 ASN.1 Security**

There are currently over 100 vulnerabilities associated with ASN.1 in the MITRE Common Vulnerability Enumeration (CVE) database [14]. These vulnerabilities cover many vendor implementations as well as encoders and decoders embedded in other software libraries (e.g., OpenSSL, Firefox, Chrome, OS X, etc.). The vulnerabilities are often manifested as low-level programming vulnerabilities. A typical class of vulnerabilities are unallowed memory reads/writes, such as buffer overflows and over-reads and NULL-pointer dereferences. While generally arcane, ASN.1 was recently featured in the popular press when an ASN.1 vender flaw was found in telecom systems, ranging from cell tower radios to cellphone baseband chips [11]; an exploit could conceivably take down an entire mobile phone network.

Multiple aspects of ASN.1 combine to make ASN.1 implementations a rich source for security vulnerabilities. One reason is that many encode/decode pairs are hand-written and ad-hoc. There are a few reasons for using ad-hoc encoders/decoders. While ASN.1 compilers exist that can generate encoders and decoders (we describe one in Sect. 4.1), many tools ignore portions of the ASN.1 specification or do not support all encoding standards, given the complexity and breadth of the language. A particular protocol may depend on ASN.1 language features or encodings unsupported by most existing tools. Tools that support the full language are generally proprietary and expensive. Finally, generated encoders/decoders might be too large or incompatible with the larger system (e.g., a web browser), due to licensing or interface incompatibilities.

Even if an ASN.1 compiler is used, the compiler will include significant handwritten libraries that deal with, e.g., serializing or deserializing base types and memory allocation. For example, the unaligned packed encoding rules (UPER) require tedious bit operations to encode types into a compact bit-vector representation. Indeed, the recent vulnerability discovered in telecom systems is not in protocol-specific generated code, but in the associated libraries [11].

Finally, because ASN.1 is regularly used in embedded and performancecritical systems, encoders/decoders are regularly written in unsafe languages, like C. As noted above, many of the critical security vulnerabilities in ASN.1 encoders/decoders are memory safety vulnerabilities in C.

# **4 Our Tools for Generating and Verifying ASN.1 Code**

We briefly introduce the two tools used in this work. First we introduce our ASN.1 compiler for generating the encode/decode pair, then we introduce the symbolic analysis engine used in the verification.

### **4.1** *High-Assurance ASN.1 Workbench* **(HAAW)**

Our *High-Assurance ASN.1 Workbench* (HAAW) is a suite of tools developed by Galois that supports each stage of the ASN.1 protocol development lifecycle: specification, design, development, and evaluation. It is composed of an interpreter, compiler, and validator, albeit with varying levels of maturity. HAAW is implemented in Haskell.

The HAAW compiler is built using semi-formal design techniques and is thoroughly tested to help ensure correctness. The implementation of the HAAW compiler is structured to be as manifestly correct as feasible. It effectively imports a (separately tested) ASN.1 interpreter which is then "partially-evaluated" on the fly to generate code. The passes are as follows: An input ASN.1 specification is "massaged" to a specification-like form which can be interpreted by a built-in ASN.1 interpreter. This specification-like form is combined with the interpreter code and is converted into a lambda-calculus representation; to this representation we apply multiple optimization rules; we finally "sequentialize" to a monadic lambda-calculus (where we are left with the lambda calculus, sequencing operators, and encoding/decoding primitives), this last representation is then transformed into C code. The generated code is linked with a library that encodes and decodes the basic ASN.1 types.

Moreover, while the HAAW compiler improves the quality of the code generated, we verify the generated code and libraries directly, so HAAW is not part of the trusted code-base.

### **4.2 The Software Analysis Workbench (SAW)**

The *Software Analysis Workbench* (SAW)<sup>1</sup> is Galois' open-source, state-of-theart symbolic analysis engine for multiple programming languages. Here we briefly introduce SAW, see Dockins *et al.* [6] for more details.

An essential goal of SAW is to generate semantic models of programs independent of a particular analysis task and to interface with existing automated reasoning tools. SAW is intended to be mostly automated but supports userguidance to improve scalability.

The high-level architecture of SAW is shown in Fig. 1. At the heart of SAW is *SAWCore*. SAWCore is SAW's intermediate representation (IR) of programs. SAWCore is a dependently-typed functional language, providing a functional representation of the semantics of a variety of imperative and functional languages.

<sup>1</sup> saw.galois.com.

**Fig. 1.** SAW architecture, reproduced from [6].

SAWCore includes common built-in rewrite rules. Additionally, users can provide domain-specific rewrite rules, and because SAWCore is a dependently-typed language, rewrite rules can be given expressive types to prove their correctness.

SAW currently supports automated translation of both *low-level virtual machine* (LLVM) and *Java virtual machine* (JVM) into SAWCore. Thus, programming languages that can be compiled to these two targets are supported by SAW. Indeed, SAW can be used to prove the equivalence between programs written in C and Java.

SAWCore can also be generated from Cryptol. Cryptol is an open-source language<sup>2</sup> for the specification and formal verification of bit-precise algorithms [10], and we use it to specify portions of our code, as we describe in Sect. 6.

A particularly interesting feature of Cryptol is that it is a typed functional language, similar to Haskell, but includes a size-polymorphic type system that includes linear integer constraints. To give a feeling for the language, the concatenate operator (#) in Cryptol has the following type:

### (#) : fst, snd, a (fin fst) => [fst]a -> [snd]a -> [fst + snd]a

It concatenates two sequences containing elements of type a, the first of length fst—which is constrained to be of finite (fin) length (infinite sequences are expressible in Cryptol)—and the second of length snd. The return type is a sequence of a's of length fst + snd. Cryptol relies on *satisfiability modulo theories* (SMT) solving for type-checking.

SAWCore is typically exported to various formats supported by external third-party solvers. This includes SAT solver representations (and inverter graphs (AIG), conjunctive normal form (CNF), and ABC's format [3]), as well as SMT-Lib2 [2], supported by a range of SMT solvers.

<sup>2</sup> https://cryptol.net/.

SAW allows bit-precise reasoning of programs, and has been used to prove optimized cryptographic software is correct [6]. SAW's bit-level reasoning is also useful for encode/decode verification, and in particular, ASN.1's UPER encoding includes substantial bit-level operations.

Finally, SAW includes *SAWScript*, a scripting language that drives SAW and connects specifications with code.

# **5 Properties: Encode/Decode Self Consistency**

Ideally, we would prove full functional correctness for the encode/decode pair: that they correctly implement the ASN.1 UPER encoding/decoding rules for the ASN.1 types defined in SAE J2735. However, to develop a specification that would formalize all the required ASN.1 constructs, their semantics, and the proper UPER encoding rules would be an extremely large and tedious undertaking (decades of "man-years"?). Moreover, it is not clear how one would ensure the correctness of such a specification.

Instead of proving full functional correctness, we prove a weaker property by proving consistency between the encoder and decoder implementations. We call our internal consistency property *self-consistency*, which we define as the conjunction of two properties, *round-trip* and *rejection*. We show that selfconsistency implies that decode is the inverse of encode, which is an intuitive property we want for an encode/decode pair.

The *round-trip property* states that a valid message that is encoded and then decoded results in the original message. This is a completeness property insofar as the decoder can decode all valid messages.

A less obvious property is the *rejection property*. The rejection property informally states that any invalid byte stream is rejected by the decoder. This is a soundness property insofar as the decoder *only* decodes valid messages.

In the context of general ASN.1 encoders/decoders, let us fix a schema *S* and an encoding rule. Let *M<sup>S</sup>* be the set of all ASN.1 abstract messages that satisfy the schema. Let *B* the set of all finite byte streams. Let *enc<sup>s</sup>* : *M<sup>s</sup>* → *B* be an encoder, a total function on *Ms*. Let *error* be a fixed constant such that *error* ∈ *Ms*. Let the total function *dec<sup>s</sup>* : *B* → (*M<sup>s</sup>* ∪ {*error*}) be its corresponding decoder.

The round-trip and rejection properties can respectively be stated as follows:

### **Definition 1 (Round-trip)**

$$\forall m \in M\_s. dec\_s(enc\_s(m)) = m.$$

**Definition 2 (Rejection)**

$$\forall b \in B.dec\_s(b) = error \lor enc\_s(dec\_s(b)) = b.$$

The two properties are independent: a decoder could properly decode valid byte streams while mapping invalid byte streams to valid messages. Such a decoder would be allowed by Round-trip but not by Rejection. An encode/decode pair that fails the Rejection property could mean that *dec* does not terminate normally on some inputs (note that *error* is a valid return value of *dec*). Clearly, undefined behavior in the decoder is a security risk.

**Definition 3 (Self-consistency).** *An encode/decode pair enc<sup>S</sup> and dec<sup>S</sup> is* self-consistent *if and only if it satisfies the round-trip and rejection properties.*

Self-consistency does not require any reference to a specification of ASN.1 encoding rules, simplifying the verification. Indeed, they are applicable to any encode/decode pair of functions.

However, as noted at the outset, self-consistency does not imply ful functional correctness. For example, for an encoder *enc<sup>S</sup>* and decoder *dec<sup>S</sup>* pair, suppose the messages *M<sup>S</sup>* = {*m*0*, m*1} and the byte streams *B* includes {*b*0*, b*1} ⊆ *B*. Suppose that according to the specification, it should be the case that *encS*(*m*0) = *b*0, *encS*(*m*1) = *b*1, *decs*(*b*0) = *m*<sup>0</sup> and *dec*(*b*1) = *m*1, and for all *b* ∈ *B* such that *b* = *b*<sup>0</sup> and *b* = *b*1, *decS*(*b*) = *error* . However, suppose that in fact *encS*(*m*0) = *b*1, *encS*(*m*1) = *b*0, *decS*(*b*0) = *m*<sup>1</sup> and *decS*(*b*1) = *m*0, and for all other *b* ∈ *B*, *dec*(*b*) = *error* . Then *enc<sup>S</sup>* and *dec<sup>S</sup>* satisfy both the round-trip and rejection properties, while being incorrect.

That said, if self-consistency holds, then correctness reduces to showing that either encoder or decoder matches its specification, but showing both hold is unnecessary.

In our work, we formally verify self-consistency and memory safety. We also give further, informal, evidence of correctness by both writing individual test vectors and by comparing our test vectors to that produced by other ASN.1 compilers.

# **6 Verification**

Figure 2 summarizes the overall approach to generating and verifying the encode/decode pair, which we reference throughout this section.

### **6.1 First Steps**

The given SAE J2735 ASN.1 specification (J2735.asn) is given as input to HAAW to generate C code for the encoder and decoder. A HAAW standard library is emitted (the dotted line from HAAW to libHAAW.c in Fig. 2 denotes that the standard library is not specific to the SAE-J2735 specification and is not compiled from HAAW).

We wrote the round-trip and rejection properties (Sect. 5) as two C functions. For example, the round-trip property is encoded, approximately, as follows:

```
bool round_trip(BSM *msg_in) {
  unsigned char str[BUF_SIZE];
  enc(msg_in, str);
```
**Fig. 2.** Code generation and verification flow.

```
BSM *msg_out;
  dec(msg_out, str);
  return equal_msg(msg_in, msg_out);
}
```
The actual round trip property is slightly longer as we need to deal with C level setup, allocation, etc. This is why we chose to implement this property in C (rather than in SAWScript).

Now all we need to do is verify, in SAWScript, that the C function round trip returns 1 *for all inputs*. At this point, it would be nice to say the power of our automated tools was sufficient to prove round trip without further programmer intervention. This, unsurprisingly, was not the case. Most of the applications of SAW have been to cryptographic algorithms where code typically has loops with statically known bounds. In our encoder/coder code we have a number of loops with unbounded iterations: given such code we need to provide some guidance to SAW.

In the following sections we present how we were able to use SAW, as well as our knowledge of our specific code, to change an intractable verification task into one that could be done (by automated tools) in less than 5 h. An important note: the rest of this section describes SAW techniques that allow us to achieve tractability, they do not change the soundness of our results.

### **6.2 Compositional Verification with SAW Overrides**

SAW supports *compositional verification*. A function (e.g., compiled from Java or C) could be specified in Cryptol and verified against its specification. That Cryptol specification can then be used in analyzing the remainder of the program, such that in a symbolic simulation, the function is replaced with its specification. We call this replacement an *override*. Overrides can be used recursively and can dramatically improve the scalability of a symbolic simulation. SAW's scripting language ensures by construction that an override has itself been verified.

Overrides are like lemmas, we prove them once, separately, and can re-use them (without re-proof). The lemma that an override provides is an equivalence between a C function and a declarative specification provided by the user (in Cryptol). The effort to write a specification and add an override is often required to manage intractability of the automated solvers used.

### **6.3 Overriding "copy bits" in SAW**

There are two critical libHAAW functions that we found to be intractable to verify using symbolic simulation naively. Here we describe generating overrides for one of them:

```
copy_bits
  ( unsigned char * dst
  , uint32_t *dst_i
  , unsigned char const * src
  , uint32_t *src_i
  , uint32_t const length)
{
  uint32_t src_i_bound = *src_i + length;
  while (*src_i < src_i_bound) {
    copy_overlapping_bits (dst, dst_i, src, src_i, src_i_bound);
  }
  return 0;
}
```
The above function copies length bits from the src array to the dst array, starting at the bit indexed by src i in src and index dst i in dst; src i and dst i are incremented by the number of bits copied; copy overlapping bits is a tedious but loop-free function with bit-level computations to convert to/from a bit-field and byte array. This library function is called by both the encoder and decoder.

One difficulty with symbolically executing copy bits with SAW is that SAW unrolls loops. Without a priori knowledge of the size of length and src i, there is no upper bound on the number of iterations of the loop. Indeed, memory safety is dependent on an invariant holding between the indices, the number of bits to copy, and the length of the destination array: the length of the destination array is not passed to the function, so there is no explicit check to ensure no write-beyond-array in the destination array.

Even if we could fix the buffer sizes and specify the relationship between the length and indexes so that the loop could be unrolled in theory, in practice, it would still be computationally infeasible for large buffers. In particular, we would have to consider every valid combination of the length and start indexes, which is cubic in the bit-length of the buffers.

To override copy bits, we write a specification of copy bits in Cryptol. The specification does not abstract the function, other than eliding the details of pointers, pointer arithmetic, and destructive updates in C. The specification is given below:

```
copy_bits : dst_n, src_n
            [dst_n][8] -> [32] -> [src_n][8] -> [32] -> [32]
            -> ([dst_n][8], [32], [32])
copy_bits dst0 dst_i0 src src_i0 length = (dst1, dst_i1, src_i1)
  where
  dst_bits0 = join dst0
  src_bits0 = join src
  dst1 = split (copy dst_bits0 0)
  copy dst_bits i =
    if i == length
    then dst_bits
    else copy dst_bits'' (i + 1)
    where
    dst_bits'' = update dst_bits (dst_i0 + i)
                   (src_bits0 @ (src_i0 + i))
  dst_i1 = dst_i0 + length
  src_i1 = src_i0 + length
```
We refer to the *Cryptol User Manual* for implementation details [10], but to provide an intuition, we describe the type signature (the first three lines above): the type is polymorphic, parameterized by dst n and src n. A type [32] is a bit-vector of length 32. A type [dst n][8] is an array of length dst n containing byte values. The function takes a destination array of bytes, a 32-bit destination index, a source array of bytes, a source index, an a length, and returns a triple containing a new destination array, and new destination and source indices, respectively. Because the specification is pure, the values that are destructively updated through pointers in the C implementation are part of the return value in the specification.

# **6.4 Multiple Overrides for "copy bits" in SAW**

Even after providing the above override for copy bits, we are *still* beyond the limits of our underlying solvers to automatically prove the equivalence of copy bits with its Cryptol specification.

However, we realize that for the SAE J2735 encode/decode, copy bits is called with a relatively small number of specific concrete values for the sizes of the dst and src arrays, the indexes dst i and src i, and the length of bits to copy length. The only values that we need to leave symbolic are the bit values within the dst and src arrays. Therefore, rather than creating a single override for an arbitrary call to copy bits, we generate separate overrides for each unique set of "specializable" arguments, i.e., dst i, src i, and length.

Thus we note another feature of SAW: SAW allows us to specify a set of concrete function arguments for an override; for each of these, SAW will specialize the override. (I.e., it will prove each specialization of the override.) In our case this turns one intractable override into 56 tractable ones. The 56 specializations (which corresponds to the number of SEQUENCE fields in the BSM specification) were not determined by trial and error but by running instrumented code.

It is important to note that the consequence of a missing overrride specialization cannot change the soundness of SAW's result: Overrides in SAW cannot change the proof results, they only change the efficiency of proof finding. If we had a missing override specialization for copy bits we would only be back where we started: a property that takes "forever" to verify.

This approach works well for the simple BSM Part I. However, once we begin to verify encoders/decoders for more complex ASN.1 specifications (e.g., containing CHOICE and OPTIONAL constructs), this method will need to be generalized.

### **6.5 Results**

A SAW script (script.saw) ties everything together and drives the symbolic execution in SAW and lifts LLVM variables and functions into a dependent logic to state pre- and post-conditions and provide Cryptol specifications as needed. Finally, SAW then generates a SMT problem; Z3 [5] is the default solver we use.

Just under 3100 lines of C code were verified, not counting blank or comment lines. The verification required writing just under 100 lines of Cryptol specification. There are 1200 lines of SAW script auto-generated by the test harness in generating the override specifications. Another 400 lines of SAW script is hand-written for the remaining overrides and to drive the overall verification.

Executed on a modern laptop with an Intel Core i7-6700HQ 2.6 GHz processor and 32 GB of memory, the verification takes 20 min to prove the round-trip property and 275 min to prove the rejection property. The round-trip property is less expensive to verify because symbolic simulation is sensitive to branching, and for the round-trip property, we assert the data is valid to start, which in turn ensures that all of the decodings succeed. In rejection, on the other hand, we have a branch at each primitive decode, and we need to consider both possibilities (success and failure).

# **7 Discussion**

### **7.1 LLVM and Definedness**

Note that our verification has been with respect to the LLVM semantics not the C source of our code. SAW does not model C semantics, but inputs LLVM as the program's semantics (we use clang to generate LLVM from the C). By verifying LLVM, SAW is made simpler (it need only model LLVM semantics rather than C) and we can do inter-program verification more easily. The process of proving that a program satisfies a given specification within SAW guarantees definedness of the program (and therefore memory safety) as a side effect. That is, the translation from LLVM into SAWCore provides a well-defined semantics for the program, and this process can only succeed if the program is well-defined. In some cases, this well-definedness is assumed during translation and then proved in the course of the specification verification. For instance, when analyzing a memory load, SAW generates a semantic model of what the program does if the load was within the bounds of the object it refers to, and generates a side condition that the load was indeed in bounds.

Verifying LLVM rather than the source program is a double-edged sword. On the one hand, the compiler front-end that generates LLVM is removed from the trusted computing base. On the other hand, the verification may not be sound with respect to the program's source semantics. In particular, C's undefined behaviors are a superset of LLVM's undefined behaviors; a compiler can soundly remove undefined behaviors but not introduce them. For example, a flaw in the GCC compiler allowed the potential for an integer overflow when multiplying the size of a storage element by the number of elements. The result could be insufficient memory being allocated, leading to a subsequent buffer overflow. clang, however, introduces an implicit trap on overflow [12].

Moreover, the LLVM language reference does not rigorously specify welldefinedness, and it is possible that our formalization of LLVM diverges from a particular compiler's [13].

### **7.2 Other Assumptions**

We made some memory safety assumptions about how the encode/decode routines are invoked. First, we assume that the input and output buffers provided to the encoder and decoder, respectively, do not alias. We also assume that each buffer is 37 bytes long (sufficient to hold a BSM with Part I only). A meta argument shows that buffers of *at least* 37 bytes are safe: we verify that for all 37-byte buffers, we never read or write past their ends. So, if the buffers were longer, we would never read the bytes above the 37th element.

For more complex data schemas (and when we extend to BSM Part II) whose messages require a varying octet size, we would need to ensure the buffers are sufficiently large for all message sizes.

### **7.3 Proof Robustness**

By "proof robustness" we mean how much effort is required to verify another protocol or changes to the protocol. We hypothesize that for other protocols that use UPER and a similar set of ASN.1 constructs, the verification effort would be small. Most of our manual effort focused on the libHAAW libraries, which is independent of the particular ASN.1 protocol verified. That said, very large protocol specifications may require additional proof effort to make them compositional.

In future work, we plan to remove the need to generate overrides as a separate step (as described in Sect. 6.2) by modifying HAAW to generate overrides as it generates the C code.

### **8 Conclusion**

Hopefully we have motivated the security threat to V2V and the need for eliminating vulnerabilities in ASN.1 code. We have presented a successful application of automated formal methods to real C code for a real-world application domain.

There are some lessons to be learned from this work:


There are many ways we hope to extend this work:

(1) We plan to extend our verification to the full BSM. This now gets us to more challenging ASN.1 constructs (e.g., CHOICE) that involve a more complicated control-flow in the encoders/decoders. We do not expect a proof to be found automatically, but our plan is to generate lemmas with the generated C code that will allow proofs to go through automatically.


**Acknowledgments.** This work was performed under subcontract to Battelle Memorial Institute for the National Highway Safety Transportation Administration (NHTSA). We thank Arthur Carter at NHTSA and Thomas Bergman of Battelle for their discussions and guidance. Our findings and opinions do not necessarily represent those of Battelle or the United States Government.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Continuous Formal Verification of Amazon s2n**

Andrey Chudnov<sup>1</sup>, Nathan Collins<sup>1</sup>, Byron Cook3,4, Joey Dodds<sup>1</sup>, Brian Huffman<sup>1</sup>, Colm MacC´arthaigh<sup>3</sup>, Stephen Magill1(B) , Eric Mertens<sup>1</sup>, Eric Mullen<sup>2</sup>, Serdar Tasiran<sup>3</sup>, Aaron Tomb<sup>1</sup>, and Eddy Westbrook<sup>1</sup>

> Galois, Inc., Portland, USA stephen@galois.com University of Washington, Seattle, USA Amazon Web Services, Seattle, USA University College London, London, UK

**Abstract.** We describe formal verification of s2n, the open source TLS implementation used in numerous Amazon services. A key aspect of this proof infrastructure is continuous checking, to ensure that properties remain proven during the lifetime of the software. At each change to the code, proofs are automatically re-established with little to no interaction from the developers. We describe the proof itself and the technical decisions that enabled integration into development.

# **1 Introduction**

The Transport Layer Security (TLS) protocol is responsible for much of the privacy and authentication we enjoy on the Internet today. It secures our phone calls, our web browsing, and connections between resources in the cloud made on our behalf. In this paper we describe an effort to prove the correctness of s2n [3], the open source TLS implementation used by many Amazon and Amazon Web Services (AWS) products (*e.g.* Amazon S3 [2]). Formal verification plays an important role for s2n. First, many security-focused customers (*e.g.* financial services, government, pharmaceutical) are moving workloads from their own data centers to AWS. Formal verification provides customers from these industries with concrete information about *how* security is established in Amazon Web Services. Secondly, automatic and continuous formal verification facilitates rapid and cost-efficient development by a distributed team of developers.

In order to realize the second goal, verification must continue to work with low effort as developers change the code. While fundamental advances have been made in recent years in the tractability of full verification, these techniques generally either: (1) target a fixed version of the software, requiring significant reproof effort whenever the software changes or, (2) are designed around synthesis of correct code from specifications. Neither of these approaches would work for Amazon as s2n is under continuous development, and new versions of the code would not automatically inherit correctness from proofs of previous versions.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 430–446, 2018. https://doi.org/10.1007/978-3-319-96142-2\_26

To address the challenge of program proving in such a development environment, we built a proof and associated infrastructure for s2n's implementations of DRBG, HMAC, and the TLS handshake. The proof targets an existing implementation and is updated either automatically or with low effort as the code changes. Furthermore, the proof connects with existing proofs of security properties, providing a high level of assurance.

Our proof is now deployed in the continuous integration environment for s2n, and provides a distributed team of developers with repeated proofs of the correctness of s2n even as they continue to modify the code. In this paper, we describe how we structured the proof and its supporting infrastructure, so that the lessons we learned will be useful to others who address similar challenges.

Figure 1 gives an overview of our proof for s2n's implementation of the HMAC algorithm and the tooling involved. At the left is the ultimate security property of interest, which for HMAC is that if the key is not known, then HMAC is indistinguishable from a random function (given some assumptions on the underlying hash functions). This is a fixed security property for HMAC and almost never changes (a change would correspond to some new way of thinking about security in the cryptographic research community). The HMAC specification is also fairly static, having been updated only once since its publication in 2002<sup>1</sup>. Beringer et al. [6] have published a mechanized formal proof that the high-level HMAC specification establishes the cryptographic security property of interest.

As we move to the right through Fig. 1, we find increasingly low-level artifacts and the rate of change of these artifacts increases. The low-level HMAC specification includes details of the API exposed by the implementation, and the implementation itself includes details such as memory management and performance optimizations. This paper focuses on verifying these components in a manner that uses proof automation to decrease the manual effort required for ongoing maintenance of these verification artifacts. At the same time, we ensure that the automated proof occurring on the right-hand side of the figure is linked to the stable, foundational security results present at the left.

In this way, we realize the assurance benefit of the foundational security work of Beringer et al. while producing a proof that can be integrated into the development workflow. The proof is applied as part of the *continuous integration* system for s2n (which uses Travis CI) and runs every time a code change is pushed or a pull request is issued. In one year of code changes only three manual updates to the proof were required.

The s2n source code, proof scripts, and access to the underlying proof tools can all be found in the s2n GitHub [3] repository. The collection of proof runs is logged and appears on the s2n Travis CI page [4].

In addition to the HMAC proof, we also reused the approach shown in the right-hand side of Fig. 1 to verify the deterministic random big generator (DRBG) algorithm and the TLS Handshake protocol. In these cases we didn't link to foundational cryptographic security proofs, but nonetheless had specifications that provided important benefits to developers by allowing them to (1)

<sup>1</sup> And this update did not change the functional behavior specified in the standard.

check their code against an independent specification and (2) check that their code continues to adhere to this specification as it changes. Our TLS Handshake proof revealed a bug (which was promptly fixed) in the s2n implementation [10], providing evidence for the first point. All of our proofs have continued to be used in development since their introduction, supporting the second point.

**Fig. 1.** An overview of the structure of our HMAC proof.

**Related Work.** Projects such as Everest [8,12], Cao [5], and Jasmin [1], generate verified cryptographic implementations from higher level specifications, *e.g.* F\* models. While progress in this space continues to be promising—HACL\* has recently achieved performance on primitives that surpasses handwritten C [25] we have found in our experiments that the generated TLS code does not yet meet the performance, power, and space constraints required by the broad range of AWS products that use s2n.

Static analysis for hand-written cryptographic implementations has been previously reported in the context of Frama-C/PolarSSL [23], focusing on scaling memory safety verification to a large body of code. Additionally, unsound but effective bug hunting techniques such as fuzzing have been applied to TLS implementations in the past [11,18]. The work we report on goes further by proving behavioral correctness properties of the implementation that are beyond the capabilities of these techniques. In this we were helped because the implementation of s2n is small (less than 10k LOC), and most iteration is bounded.

The goal of our work is to verify deep properties of an existing and actively developed open source TLS implementation that has been developed for both high performance and low power as required by a diverse range of AWS products. Our approach was guided by lessons learned in several previous attempts to prove the correctness of s2n that either (1) required too much developer interaction during the modification of the code [17], or (2) where pushbutton symbolic model checking tools did not scale. Similarly, proofs developed using tools from the Verified Software Toolchain (VST) [6] are valuable for establishing the correctness and security of specifications, but are not sufficiently resilient to code changes, making them challenging to integrate into an ongoing development process. Their use of a layered proof structure, however, provided us with a specification that we could use to leverage their security proof in our work.

O'Hearn details the industry impact of continuous reasoning about code in [19], and describes additional instances of integration of formal methods with developer workflows.

# **2 Proof of HMAC**

In this section, we walk through our HMAC proof in detail, highlighting how the proof is decomposed, the guarantees provided, the tools used, and how this approach supports integration of verification into the development work-flow. While HMAC serves as an example, we have also performed a similar proof of the DRBG and TLS Handshake implementations. We do not discuss DRBG further, as there are no proof details that differ significantly from HMAC. We describe our TLS verification in Sect. 3.

### **2.1 High-Level HMAC Specification**

The keyed-Hash Message Authentication Code algorithm (HMAC) is used for authenticated integrity in TLS 1.2. Authenticated integrity guarantees that the data originated from the sender and was not changed or duplicated in transit. HMAC is used as the foundation of the TLS Pseudorandom Function (PRF), from which the data transmission and data authentication shared keys are derived. This ensures that both the sender and recipient have exchanged the correct secrets before a TLS connection can proceed to the data transmission phase.

HMAC is also used by some TLS cipher suites to authenticate the integrity of TLS records in the data transmission phase. This ensures, for example, that a third party watching the TLS connection between a user and a webmail client is unable to change or repeat the contents of an email body during transmission. It is also used by the HMAC-based Extract-and-Expand Key Derivation Function (HKDF) which is implemented within s2n as a utility function for general purpose key derivation and is central to the design of the TLS1.3 PRF.

FIPS 198-1 [24] defines the HMAC algorithm as

$$\mathsf{HMAC}(K, message) = \mathsf{H}((K \oplus opad) \| \mathsf{H}((K \oplus ipad) \| message))$$

where H is any hash function, <sup>⊕</sup> is bitwise xor, and is concatenation. *opad* and *ipad* are constants defined by the specification. We will refer to this definition as the *monolithic* specification.

Following Fig. 1, we use the Cryptol specification language [14] to express HMAC in a form suitable for mechanized verification, first in a monolithic form, and then in an incremental form. We prove high-level properties with Coq [22] and tie these to the code using the Software Analysis Workbench (SAW) [16]. We first describe the proof of high-level properties before going into specifics regarding the tools in Sect. 2.4.

### **2.2 Security Properties of HMAC**

The Cryptol version of the Monolithic HMAC specification follows.

```
hmac k message = H((k ^ opad) # H((k ^ ipad) # message))
```
where H is any hash function, ^ is bitwise xor, and # is concatenation.

The high-level Cryptol specification and the FIPS document look nearly identical, but what assurance do we have that either description of the algorithm is cryptographically secure? We can provide this assurance by showing that the Cryptol specification establishes one of the security properties that HMAC is intended to provide—namely, that HMAC is indistinguishable from a function returning random bits.

Indistinguishability from random is a property of cryptographic output that says that there is no effective strategy by which an attacker that is viewing the output of the cryptographic function and a true random output can distinguish the two, where an "effective" strategy is one that has a non-negligible chance of success given bounded computing resources. If the output of a cryptographic function is indistinguishable from random, that implies that no information can be learned about the inputs of that function by examining the outputs.

We prove that our Cryptol HMAC specification has this indistinguishability property using an operational semantics of Cryptol we developed in Coq. The semantics enable us to reuse portions of the proof by Beringer et. al [6], which uses the Coq Foundational Cryptography Framework (FCF) library [20] to establish the security of the HMAC construction. We construct a Coq proof showing that our Cryptol specification is equivalent (when interpreted using the formal operational semantics) to the specification considered in the Beringer et. al work. The Cryptol specification is a stepping stone to automated verification of the s2n implementations, so when combined with the verification work we describe subsequently, we eventually establish that the implementation of HMAC in s2n also has the desired security property. The Coq code directly relating to HMAC is all on the s2n GitHub page. These proofs are not run as part of continuous integration, rather, they are only rerun in the unlikely event that the monolithic specification changes.

### **2.3 Low-Level Specification**

The formal specification of HMAC presented in the FIPS standard operates on a single *complete* message. However, network communication often requires the incremental processing of messages. Thus all modern implementations of HMAC provide an incremental interface with the following abstract types:

```
init : Key -> State
update : Message -> State -> State
digest : State -> MAC
```
The init function creates a state from a key, the update function updates that state incrementally with chunks of the message, and the digest function finalizes the state, producing the MAC.

The one-line monolithic specification is related to these incremental functions as follows. If we can partition a message m into *m* = *m*1*m*2 *... m<sup>n</sup>* then (in pseudo code/logic notation)

$$\mathsf{HMAC}(k, m) = \mathsf{digest}(\mathsf{update}(m\_n(\dots(\mathsf{update}\ m\_1(\text{init}\ k))))\tag{1}$$

In other words, any MAC generated by partitioning a message and incrementally sending it in order through these functions should be equal to a MAC generated by the complete message HMAC interface used in the specification.

We prove that the incremental interface to HMAC is equivalent to the nonincremental version using a combination of manual proof in Coq and automated proof in Cryptol. Note that this equivalence property can be stated in an implementation-independent manner and proved outside of a program verification context. This is the approach we take—independently proving that the incremental and monolithic message interfaces compute the same HMAC, and then separately showing that s2n correctly implements the incremental interface.

Our Coq proof proceeds via induction over the number of partitions with the following lemmas establishing the relationship between the monolithic and iterative implementations. These lemmas are introduced as axioms in the Coq proof, but subsequently checked using SAW.

```
update_empty : forall s, HMAC_update empty_string s = s.
equiv_one : forall m k,
 HMAC_digest (HMAC_update m (HMAC_init k)) = HMAC k m.
update_concat : forall m1 m2 s,
 HMAC_update (concat m1 m2) s = HMAC_update m2 (HMAC_update m1 s).
```
The first lemma states that processing an empty message does not change the state. The second lemma states that applying the incremental interface to a single message is equivalent to applying the monolithic interface. These lemmas constitute the base cases for an inductive proof of equation (1) above. The last lemma states that calling update twice (first with m1 and then with m2) results in the same state as calling update once with m1 concatenated with m2. This constitutes the inductive step in the proof of (1).

The update\_empty lemma can be proved by analyzing the code with symbolic values provided for the state *s*, as the state is of fixed size. The equiv\_one and update\_concat lemmas require reasoning about unbounded data. SAW has limited support for such proofs. In particular, it has support for equational rewriting of terms in its intermediate language, but not for induction. In the case of the update\_concat lemma, a few simple builtin rewrite rules are sufficient to establish the statement for all message sizes. For equiv\_one, a proof of the statement for all message sizes would require induction. Since SAW does not support induction, we prove that this statement holds for a finite number of key and message sizes. In theory we could still obtain a complete proof by checking all message sizes up to 16k bytes (the maximum size message permitted by the TLS standard). This may be tractable in a one-off proof, but for our continuously-applied proofs we instead consider a smaller set of samples, chosen to cover all branches in the code. This yields a result that is short of full proof, but still provides much higher state space coverage than testing methods.

Given the three lemmas above, we then use Coq to prove the following theorem by induction on the list of partitions, ms.

```
HMAC key (fold_right concat empty_string ms) =
   HMAC_digest (fold_left (fun (st: state) msg =>
                                 HMAC_update msg st)
                            ms
                          (HMAC_init key)).
```
The theorem establishes the equivalence of the incremental and monolithic interfaces for any decomposition of a message into any number of fragments of any size.

# **2.4 Implementation Verification**

The incremental Cryptol specification is low-level enough that we were able to connect it to the s2n HMAC implementation using automated proof techniques. As this is the aspect of the verification effort that is critical for integration into an active development environment, we go into some detail, first discussing the tools that were used and then describing the structure of the proof.

**Tools.** We use the Software Analysis Workbench (SAW) to orchestrate this step of the proof. SAW is effective both for manipulating the kinds of functional terms that arise from Cryptol, and for constructing functional models from imperative programs. It can be used to show equivalence of distinct software implementations (*e.g.* an implementation in C and one in Java) or equivalence of an implementation and an executable specification.

SAW uses bounded symbolic execution to translate Cryptol, Java, and C programs into logical expressions, and proves properties about the logical expressions using a combination of rewriting, SAT, and SMT. The result of the bounded symbolic execution of the input programs is a pure functional term representing the function's entire semantics. These extracted semantics are then related to the Cryptol specifications by way of precondition and postcondition assertions on the program state.

The top-level theorems we prove have some variables that are universally quantified (e.g. the key used in HMAC) and others that are parameters we instantiate to a constant (e.g. the size of the key). We achieve coverage for the latter by running the proof for several parameter instantiations. In some cases this is sufficient to cover all cases (e.g. the standard allows only a small finite number of key sizes). In others, the space of possible instantiations is large enough that fully covering it would yield runtimes too long to fit into the developer workflow (for example, messages can be up to 16k long). In such cases, we consider a smaller set of samples, chosen to cover all branches in the code. This yields a result that is short of full proof, but still provides much higher state space coverage than testing methods.

Internally SAW reasons about C programs by first translating them to LLVM. For the remainder of the paper we will talk about the C code, although from a soundness perspective the C code must be compiled through LLVM for the proofs to apply to the compiled code.

**Proof Structure.** The functions in the low-level Cryptol specification described above share the incremental format of the C program, and also consume arguments and operate on state that matches the usage of arguments and state in the C code. However, the Cryptol specification does not capture the layout of state in memory. This separates concerns and allows us to reason about equivalence of the monolithic and incremental interfaces in a more tractable purely functional setting, while performing the implementation proof in a context in which the specification and implementation are already structurally quite similar.

As an example of this structural similarity, the C function has type:

```
int s2n_hmac_update(struct s2n_hmac_state *state,
                    const void *in, uint32_t size);
```
We define a corresponding Cryptol specification with type:

```
hmac_update : {Size} (32 >= width Size) =>
              HMAC_state -> [Size][8] -> HMAC_state
```
These type signatures look a bit different, but they represent the same thing. In Cryptol, we list Size first, because it is a type, not a value. This means that we do not need to independently check that the input buffer (in Cryptol represented by the type [Size][8]) matches the size input—the Cryptol type system guarantees it. The type system also sets the constraint that the size doesn't exceed 2<sup>32</sup>, a constraint set by the C type of Size.

We use SAW's SAWScript language to describe the expected memory layout of the C program, and to map the inputs and outputs of the Cryptol function to the inputs and outputs of the C program. The following code presents the SAWScript for the hmac\_update\_spec function.

```
1 let hmac_update_spec msg_size cfg = do {
2 (msg_val, msg_pointer) <- ptr_to_fresh_array msg_size i8;
3 (initial_state, state_pointer) <- setup_hmac_state cfg
4 hmac_invariants initial_state cfg;
5
6 execute_func [state_pointer, message_pointer, msg_size];
7
8 let final_state =
9 {{ hmac_update_c_state initial_state msg_val }};
10 check_hmac_state state_pointer final_state;
11 hmac_invariants final_state cfg;
12 check_return zero;
13 };
```
This SAWScript code represents a Hoare triple, with the precondition and post condition separated by the body (the execute\_func command), which performs the symbolic execution of the LLVM code using the provided arguments. Lines 2 and 3 are effectively universal quantification over the triple, setting up the values and pointers that match the type needed by the C function. The values msg\_val and initial\_state are referenced in both the C code and the Cryptol specification, whereas the pointers exist only on the C side.

Lines 8–10 capture that the final state resulting from executing the C function should be equivalent to the state produced by evaluating the Cryptol specification. Specifically, Lines 8 and 9 capture the output of the Cryptol specification (double curly braces denote Cryptol expressions within SAWScript) and Line 10 asserts that this state matches the C state present in memory at state\_pointer. This is what ultimately establishes equivalence of the implementation and specification.

The proof is aided by maintaining a collection of state invariants, which are assumed to hold in Line 4 and are re-established in Line 11. These are manual invariants, but they occur as function specifications rather than appearing internal to loops. They only require modification in the event that the meaning of the HMAC state changes.

The msg\_size parameter indicates how large of a message this particular proof should cover. Because SAW performs a bounded unrolling of the program under analysis, each proof must cover one fixed size for each unbounded data structure or iterative construct. However, by parameterizing the proof, it can easily be repeated for multiple sizes. Furthermore, as described in Sect. 2.3, we also prove in Coq that calling update twice with messages *m*<sup>1</sup> and *m*<sup>2</sup> is equivalent to calling it once with *m*<sup>1</sup> concatenated with *m*2. As a consequence, the fixed size proofs we perform of update can be composed to guarantee that the update function is correct even over longer messages.

The cfg parameter contains configuration values for each of the six hashes that can be used with HMAC. The configuration values of interest to HMAC are the input and output sizes of the hash block function.

Given the specification of the C function above, we can now verify that the implementation satisfies the specification:

```
verify m "s2n_hmac_update"
```

```
hash_ovs true (hmac_update_spec msg_size cfg) yices_hash_unint;
```
The "s2n\_hmac\_update" argument specifies the C function that we are verifying. hash\_ovs is a list, defined elsewhere, that contains all of the *overrides* that the verification will use. An override is a specification that will be used in place of a particular implementation function and corresponds to what other tools call *stubs* or *models*. In this case, we've overridden all of the C hash functions, stating assumptions regarding their use of memory and their equivalence to Cryptol implementations of the same hash functions. When the verifier comes across a call to one of these hash functions in the C code, it will instead use the provided specification. The result is that our proof *assumes correct implementation of the hash functions*.

The fact that the structure of the low-level Cryptol specification matches the structure of the C code, coupled with SAW's use of SMT as the primary mechanism for discharging verification conditions, enables a proof that continues to work through a variety of code changes. In particular, changes to the code in function bodies often requires no corresponding specification or proof script change. Similarly, changes that add fields or change aspects of in-memory data structures that are not referenced by the specification do not require proof updates. Changes in the API (e.g. function arguments) do require proof script changes, but these are typically minor. Fixing a broken proof typically involves adding a new state field to the SAW script, updating the Cryptol specification to use that field correctly, and then passing the value of that field into the Cryptol program in the post-condition. If the Cryptol specification is incorrect, SAW will generate counterexamples that can be used to trace through the code and the spec together in order to discover the mismatch.

### **2.5 Integrating the Proof into Development**

Integration with the s2n CI system mostly took place within the Travis configuration file for s2n. At the time of integration, targets for the build, integration testing, and fuzzing on both Linux and OSX already existed. We updated the Travis system with Bash scripts that automatically download and install the appropriate builds of SAW, Z3, and Yices into the Travis system. These files are in the s2n repository and can be reused by anyone under the Apache 2.0 license.

A Travis CI build can occur on any number of virtual machines, and each virtual machine is given an hour to complete. We run our HMAC proofs on configurations for six different hashes. For each of these configurations we check at three key-sizes in order to test the relevant cases in the implementation (small keys get padded, exact keys remain unchanged, and large keys are hashed). For each of those key-sizes we check six different message sizes. These proofs run in an average of ten minutes. We discovered that it's best to stay well clear of the 60 min limit imposed by Travis in order to avoid false-negatives due to variations in execution time.

The proof runs alongside the tests that are present in the s2n repository on every build, and if the proof fails a flag is raised just as if a test case were to fail.

### **3 Proof of TLS Handshake**

In addition to the HMAC and DRBG proofs, we have proved correctness of the TLS state machine implemented in s2n. Specifically, we have proved that (1) it implements a subset of TLS 1.2 as defined in IETF RFCs 5246 [21], 5077 [15] and 6066 [13] and (2) the socket corking API, which optimizes how data is split into packets, is used correctly. Formally, we proved that the implementation *refines* a specification (conversely, the specification *simulates* the implementation). We obtained this Cryptol specification, called the *RFC specification* by examining the RFCs and hand-compiling them into a Cryptol file complete with relevant excerpts from the RFCs. We assume that the TLS handshake as specified in the RFCs is secure, and do not formalize nor verify any cryptographic properties of the specification. In the future, we would like to take a similar approach to that described in Sect. 2.2 to link our refinement proof with a specification-level security proof for TLS, such as that from miTLS [9].

The s2n state machine is designed to ensure correctness and security, preventing join-of-state-machines vulnerabilities like SMACK [7]. In addition, s2n allows increased throughput via the use of TCP socket corking, which combines several TLS records into one TCP frame where appropriate.

The states and transitions of the s2n state machine are encoded explicitly as linearized arrays, as opposed to being intertwined with message parsing and other logic. This is an elegant decomposition of the problem that makes most of the assumptions explicit and enables the use of common logic for message and error handling as well as protocol tracking.

Even with the carefully designed state machine implementation, formal specification and verification helped uncover a bug [10].

**Structure of the TLS Handshake State Machine Correctness Proof.** The automated proof of correctness of the TLS state machine has two parts (Fig. 2). First we establish an equivalence between the two functions<sup>2</sup> that drive the TLS handshake state machine in s2n and their respective specifications in Cryptol. Again we utilize *low-level* specifications that closely mirror the shape of the C functions. Our end goal, however, is correctness with respect to the standards, encoded in the *RFC specification* in Cryptol. The library implements only a subset of the standards, thus we can only prove a simulation relation and not equivalence. Namely, we show that every sequence of messages generated by the low-level specification starting from a valid initial state can be generated by the RFC specification starting from a related state. The dashed line in Fig. 2 shows at which points the states match at the implementation and specification levels.

**Fig. 2.** Structure of the TLS handshake correctness proof

<sup>2</sup> s2n conn set handshake type and s2n advance message.

**RFC-Based Specification of the TLS Handshake.** The high-level handshake protocol specification that captures the TLS state machine is implemented in Cryptol and accounts for the protocol, message type and direction, as well as conditions for branching in terms of abstract connection parameters, but not message contents.

We represent the set of states as unsigned 5-bit integers (Listing 1). The state transition relation is represented by a Cryptol function handshakeTransition (Listing 2) which, given abstract connection parameters (Listing 3) and the current state returns the next state. If there is no valid next state, the state machine stutters. The parameters determine the transition to take in each state and represent configurations of the end-points as well as contents of the HELLO message sent by the other party. We kept the latter separate from the message specifications in order to avoid reasoning about message structure and parsing. We can still relate the abstract parameters to the implementation because they are captured in the connection state. Finally, the message function (Listing 4) gives the message type, protocol and direction for every state.

```
type State = [5]
(helloRequestSent : State) = 0
(clientHelloSent : State) = 1
(serverHelloSent : State) = 2
// ...
(serverCertificateStatusSent : State) = 23
```
Listing 1: Specification of TLS handshake protocol states

```
handshakeTransition : Parameters -> State -> State
handshakeTransition params old =
  snd (find fst (True, old) [ (old == from /\ p, to)
                            | (from, p, to) <- valid_transitions]) where
  valid_transitions =
    [(helloRequestSent, True, clientHelloSent)
    ,(clientHelloSent, True, serverHelloSent)
    ,(serverHelloSent, params.keyExchange != DH_anon
                   /\ ~params.sessionTicket, serverCertificateSent)
    // ...
    ,(serverCertificateStatusSent, ~(keyExchangeNonEphemeral params)
      , serverKeyExchangeSent)
    ]
```
Listing 2: Specification of the TLS handshake state transition function. Valid transitions are encoded as triples (*start, transition condition, end*).

```
type KeyExchange = [3]
(DH_anon : KeyExchange) = 0
// ...
(DH_RSA : KeyExchange) = 5
type Parameters =
{keyExchange : KeyExchange // Negotiated key exchange algorithm
,sessionTicket : Bit // The client had a session ticket
,renewSessionTicket : Bit // Server decides to renew a session ticket
,sendCertificateStatus : Bit // Server decides to send the certificate
                            // status message
,requestClientCert : Bit // Server requests a cert from the client
,includeSessionTicket : Bit} // Server includes a session ticket
                            // extension in SERVER_HELLO
```
Listing 3: Abstract connection parameters

```
message : State -> Message
message = lookupDefault messages (mkMessage noSender data error)
  where messages =
    [(helloRequestSent, mkMessage server handshake helloRequest)
    ,(clientHelloSent, mkMessage client handshake clientHello)
    ,(serverHelloSent, mkMessage server handshake serverHello)
    // ...
    ,(serverChangeCipherSpecSent,
      mkMessage server changeCipherSpec changeCipherSpecMessage)
    ,(serverFinishedSent, mkMessage server handshake finished)
    ,(applicationDataTransmission, mkMessage both data applicationData)
    ]
```
Listing 4: Expected message sent/received in each handshake state

**Socket Corking.** Socket corking is a mechanism for reducing packet fragmentation and increasing throughput by making sure full TCP frames are sent whenever possible. It is implemented in Linux and FreeBSD using the TCP CORK and TCP NOPUSH flags respectively. When the flag is set, the socket is considered corked, and the operating system will only send complete (filled up to the buffer length) TCP frames. When the flag is unset, the current buffer, as well as all future writes, are sent immediately.

Writing to an uncorked socket is possible, but undesirable as it might result in partial packets being sent, potentially reducing throughput. On the other hand, forgetting to uncork a socket after the last write can have more serious consequences. According to the documentation, Linux limits the duration of corking to 200 ms, while FreeBSD has no limit. Hence, leaving a socket corked in FreeBSD might result in the data not being sent. We have verified that sockets are not corked or uncorked twice in a row. In addition, the structure of the message handling implementation in s2n helps us informally establish a stronger corking safety property. Because explicit handshake message sequences include the direction the message is sent, we can establish that the socket is (un)corked appropriately when the message direction changes. In future work we plan to expand the scope of our proof to allow us to formally establish full corking safety.

# **4 Operationalizing the Proof**

We have integrated the checking of our proof into the build system of s2n, as well as the Continuous Integration (CI) system used to check the validity of code as it is added to the s2n repository on GitHub. For the green "build passed" badge displayed on the s2n GitHub page to appear, all code updates now must successfully verify with our proof scripts. Not only do the these checks run on committed code, they are also automatically run on all pull requests to the project. This allows the maintainers of s2n to quickly determine the correctness of submitted changes when they touch the code that we have proved. In this section we discuss aspects of our tooling that were important enablers of this integration.

*Proof Robustness.* For this integration to work, our proofs must be robust in the face of code change. Evolving projects like s2n should not be slowed down by the need to update proofs every time the code changes. Too many proof updates can lead to significantly slowed development or, in the extreme case, to proofs being disabled or ignored in the CI environment. The automated nature of our proofs mean that they generally need to be changed only in the event of interface modifications—either to function declarations or state definitions.

Of these two, state changes are the most common, and can be quite complex considering that there are usually large possibly nested C structs involved (for example, the s2n\_connection struct has around 50 fields, some of which are structs themselves). To avoid the developer pain that would arise if such struct updates caused the proof the break, we have structured the verification so that proof scripts do not require updates when the modified portions of the state do not affect the computation being proved. Recall that our proofs are focused on functional correctness. Thus in order to affect the proof, a new or modified field must influence the computation. Many struct changes target non-security-critical portions of the code (*e.g.* to track additional data for logging) and so do not meet this criterion. For such fields we prove that they are handled in a memory safe manner and that they do not affect the computation being performed by the code the proof script targets.

In the future, we intend to add the option to perform a "strict" version of this state handling logic to SAW, which would ensure that newly added fields are not modified at all by the portion of the code being proved. Such a check would ensure that the computation being analyzed computes the specified function *and nothing else* and would highlight cases in which new fields introduce undesirable data flows (e.g. incorrectly storing sensitive data). However even such an option would not replace whole program data flow analysis, which we recommend in cases where there is concern about potential incorrect data handling.

*Negative Test Cases.* Each of our proofs also includes a series of negative test cases as evidence that the tools are functioning properly. These test cases patch the code with a variety of mistakes that might actually occur and then run the same proof scripts using the same build tools to check that the tool detects the introduced error.

Examples of the negative test cases we use include an incorrect modification to a side-channel mitigation, running our TLS proofs on a version of the code with an extra call to cork and uncork, a version modified to allow early CCS, as well as a version with the incomplete handshake bug that we discovered in the process of developing the proof. Such tests are critical, both to display the value of the proofs, by providing them with realistic bugs to catch, and as a defense against possible bugs in the tool that may be introduced as it is updated.

*Proof Metrics.* We also report real-time proof metrics. Our proof scripts print out JSON encoded statistics into the Travis logs. From there, we have developed an in-browser tool that scrapes the Travis logs for the project, compiling the relevant statistics into easily consumable charts and tables. The primary metrics we track are: (1) the number of lines of code that are analyzed by the proof (which increases as we develop proofs for more components of s2n), and (2) the number of times the verified code has been changed and re-analyzed (which tracks the ongoing value of the proof). This allows developers to easily track the impact of the proofs over time.

Since deployment of the proof to the CI system in November of 2016 our proofs have been re-played 956 times. This number does not account for proof replays performed in forks of the repository. We have had to update the proof three times. In all cases the proof update was complete before the code review process finished. Not all of these runs involved modification to the code that our proofs were about, however each of the runs increased the confidence of the maintainers in the relevant code changes, and each run reestablishes the correctness of the code to the public, who may not be aware of what code changed at each commit.

HMAC and DRBG each took roughly 3 months of engineering effort. The TLS handshake verification took longer at 8 months, though some of that time involved developing tool extensions to support reasoning about protocols. At the start of each project, the proof-writers were familiar with the proof tools but not with the algorithms or the s2n implementations of them. The effort amounts listed above include understanding the C code, writing the specifications in Cryptol, developing the code-spec proofs using SAW, the CI implementation work, and the process of merging the proof artifacts into the upstream code-base.

# **5 Conclusion**

In this case study we have described the development and operation in practice of a continuously checked proof ensuring key properties of the TLS implementation used by many Amazon and AWS services. Based on several previous attempts to prove the correctness of s2n that either required too much developer interaction during modifications or where symbolic reasoning tools did not scale, we developed a proof structure that nearly eliminates the need for developers to understand or modify the proof following modifications to the code.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Symbolic Liveness Analysis of Real-World Software**

Daniel Schemmel1(B), Julian B¨uning<sup>1</sup>, Oscar Soria Dustmann<sup>1</sup>, Thomas Noll<sup>2</sup>, and Klaus Wehrle<sup>1</sup>

> <sup>1</sup> Communication and Distributed Systems, RWTH Aachen University, Aachen, Germany *{*schemmel,buening,soriadustmann, wehrle*}*@comsys.rwth-aachen.de <sup>2</sup> Software Modeling and Verification, RWTH Aachen University, Aachen, Germany noll@cs.rwth-aachen.de

**Abstract.** Liveness violation bugs are notoriously hard to detect, especially due to the difficulty inherent in applying formal methods to realworld programs. We present a generic and practically useful liveness property which defines a program as being live as long as it will eventually either consume more input or terminate. We show that this property naturally maps to many different kinds of real-world programs.

To demonstrate the usefulness of our liveness property, we also present an algorithm that can be efficiently implemented to dynamically find lassos in the target program's state space during Symbolic Execution. This extends Symbolic Execution, a well known dynamic testing technique, to find a new class of program defects, namely liveness violations, while only incurring a small runtime and memory overhead, as evidenced by our evaluation. The implementation of our method found a total of five previously undiscovered software defects in BusyBox and the GNU Coreutils. All five defects have been confirmed and fixed by the respective maintainers after shipping for years, most of them well over a decade.

**Keywords:** Liveness analysis · Symbolic Execution · Software testing Non-termination bugs

# **1 Introduction**

Advances in formal testing and verification methods, such as Symbolic Execution [10–12,22–24,42,49] and Model Checking [5,6,13,17,21,27,29,30,43,50], have enabled the practical analysis of real-world software. Many of these approaches are based on the formal specification of temporal system properties using sets of infinite sequences of states [1], which can be classified as either safety, liveness, or properties that are neither [31]. (However, every linear-time property can be represented as the conjunction of a safety and a liveness property.) This distinction is motivated by the different techniques employed for proving or disproving such properties. In practical applications, safety properties are prevalent. They constrain the finite behavior of a system, ensuring that "nothing bad" happens, and can therefore be checked by reachability analysis. Hence, efficient algorithms and tools have been devised for checking such properties that return a finite counterexample in case of a violation [34].

Liveness properties, on the other hand, do not rule out any finite behavior but constrain infinite behavior to eventually do "something good" [2]. Their checking generally requires more sophisticated algorithms since they must be able to generate (finite representations of) infinite counterexamples. Moreover, common finite-state abstractions that are often employed for checking safety do generally not preserve liveness properties.

While it may be easy to create a domain-specific liveness property (e.g., "a GET/HTTP/1.1 must eventually be answered with an HTTP/1.1 {status}"), it is much harder to formulate *general* liveness properties. We tackle this challenge by proposing a liveness property based on the notion of programs as implementations of algorithms that transform input into output:

**Definition 1.** *A program is* live *if and only if it always eventually consumes input or terminates.*

By relying on input instead of output as the measure of progress, we circumnavigate difficulties caused by many common programming patterns such as printing status messages or logging the current state.

**Detection.** We present an algorithm to detect violations of this liveness property based on a straightforward idea: Execute the program and check after each instruction if the whole program state has been encountered before (identical contents of all registers and addressable memory). If a repetition is found that does not consume input, it is deterministic and will keep recurring ad infinitum. To facilitate checking real-world programs, we perform the search for such *lassos* in the program's state space while executing it symbolically.

**Examples.** Some examples that show the generality of this liveness property are: 1. Programs that operate on input from files and streams, such as cat, sha256sum or tail. This kind of program is intended to continue running as long as input is available. In some cases this input may be infinite (e.g., cat -). 2. Reactive programs, such as calc.exe or nginx wait for events to occur. Once an event occurs, a burst of activity computes an answer, before the software goes back to waiting for the next event. Often, an event can be sent to signal a termination request. Such events are input just as much as the contents of a file read by the program are input.

In rare cases, a program can intuitively be considered live without satisfying our liveness property. Most prominent is the yes utility, which will loop forever, only printing output. According to our experience the set of useful programs that intentionally allow for an infinite trace consuming only finite input is very small and the violation of our liveness property can, in such cases, easily be recognized as intentional. Our evaluation supports this claim (cf. Sect. 6).

**Bugs and Violations.** The implementation of our algorithm detected a total of five unintended and previously unknown liveness violations in the GNU Coreutils and BusyBox, all of which have been in the respective codebases for at least 7 to 19 years. All five bugs have been confirmed and fixed within days. The three implementations of yes we tested as part of our evaluation, were correctly detected to not be live. We also automatically generated liveness violating input programs for all sed interpreters.

### **1.1 Key Contributions**

This paper presents four key contributions:


#### **1.2 Structure**

We discuss related work (Sect. 2), before formally defining our liveness property (Sect. 3). Then, we describe the lasso detection algorithm (Sect. 4), demonstrate the practical applicability by implementing the algorithm for the SymEx engine KLEE (Sect. 5) and evaluate it on three real-world software suites (Sect. 6). We finally discuss the practical limitations (Sect. 7) and conclude (Sect. 8).

### **2 Related Work**

General liveness properties [2] can be verified by *proof-based methods* [40], which generally require heavy user support. Contrarily, our work is based upon the state-exploration approach to verification. Another prominent approach to verify the correctness of a system with respect to its specification is automatic *Model Checking* using automata or tableau based methods [5].

In order to combat state-space explosion, many optimization techniques have been developed. Most of these, however, are only applicable to safety properties. For example, *Bounded Model Checking (BMC)* of software is a well-established method for detecting bugs and runtime errors [7,18,19] that is implemented by a number of tools [16,38]. These tools investigate finite paths in programs by bounding the number of loop iterations and the depth of function calls, which is not necessarily suited to detect the sort of liveness violations we aim to discover. There is work trying to establish completeness thresholds of BMC for (safety and) liveness properties [33], but these are useful only for comparatively small

<sup>1</sup> https://github.com/COMSYS/SymbolicLivenessAnalysis.

systems. Moreover, most BMC techniques are based on boolean SAT, instead of SMT, as required for dealing with the intricacies of real-world software.

*Termination* is closely related to liveness in our sense, and has been intensively studied. It boils down to showing the well-foundedness of the program's transition relation by identifying an appropriate ranking function. In recent works, this is accomplished by first synthesizing conditional termination proofs for program fragments such as loops, and then combining sub-proofs using a transformation that isolates program states for which termination has not been proven yet [8]. A common assumption in this setting is that program variables are mathematical integers, which eases reasoning but is generally unsound. A notable exception is AProVE [28], an automated tool for termination and complexity analysis that takes (amongst others) LLVM intermediate code and builds a SymEx graph that combines SymEx and state-space abstraction, covering both byte-accurate pointer arithmetic and bit-precise modeling of integers. However, advanced liveness properties, floating point values, complex data structures and recursive procedures are unsupported. While a termination proof is a witness for our liveness property, an infinite program execution constitutes neither witness nor violation. Therefore, *non-termination* proof generators, such as TNT [26], while still related, are not relevant to our liveness property.

The authors of Bolt [32] present an entirely different approach, by proposing an in-vivo analysis and correction method. Bolt does not aim to prove that a system terminates or not, but rather provides a means to force already running binaries out of a long-running or infinite loop. To this end, Bolt can attach to an unprepared, running program and will detect loops through memory snapshotting, comparing snapshots to a list of previous snapshots. A user may then choose to forcefully break the loop by applying one of two strategies as a last-resort option. Previous research into in-vivo analysis of hanging systems attempts to prove that a given process has run into an infinite loop [9]. Similarly to Bolt, Looper also attaches to a binary but then uses Concolic Execution (ConEx) to gain insight into the remaining, possible memory changes for the process. This allows for a diagnosis of whether the process is still making progress and will eventually terminate. Both approaches are primarily aimed at understanding or handling an apparent hang, not for proactively searching for unknown defects.

In [35], the authors argue that non-termination has been researched significantly less than termination. Similar to [14,25], they employ static analysis to find every Strongly Connected SubGraph (SCSG) in the Control Flow Graph (CFG) of a given program. Here, a Max-SMT solver is used to synthesize a formulaic representation of each node, which is both a quasi-invariant (i.e., always holding after it held once) and edge-closing (i.e., not allowing a transition that leaves the node's SCSG to be taken). If the solver succeeds for each node in a reachable SCSG, a non-terminating path has been found.

In summary, the applicability of efficient methods for checking liveness in our setting is hampered by restrictions arising from the programming model, the supported properties (e.g., only termination), scalability issues, missing support for non-terminating behavior or false positives due to over-approximation. In the following, we present our own solution to liveness checking of real-world software.

### **3 Liveness**

We begin by formally defining our liveness property following the approach by Alpern and Schneider [1–3], which relies on the view that liveness properties do not constrain the finite behaviors but introduce conditions on infinite behaviors. Here, possible behaviors are given by (edge-labeled) transition systems.

**Definition 2 (Transition System).** *A* transition system *T is a 4-tuple* (*S*, *Act*, −→,*I*)*:*


*For* s ∈ *S , the sets of* outgoing actions *is denoted by Out*(s) = {α ∈ *Act* | s <sup>α</sup> −→ s *for some* s ∈ *S*}*. Moreover, we require T to be* deadlock free*, i.e., Out*(s) = ∅ *for each* s ∈ *S. A* terminal state *is indicated by a self-loop involving the distinguished action* ↓ ∈ *Act : if* ↓ ∈ *Out*(s)*, then Out*(s) = {↓}*.*

The self-loops ensure that all *executions* of a program are infinite, which is necessary as terminal states indicate successful completion in our setting.

**Definition 3 (Executions and Traces).** *An (infinite)* execution *is a sequence of the form* s0α1s1α2s<sup>2</sup> ... *such that* s<sup>0</sup> ∈ *I and* s<sup>i</sup> α*i*+1 −−−→ <sup>s</sup>i+1 *for every* <sup>i</sup> <sup>∈</sup> <sup>N</sup>*. Its* trace *is given by* <sup>α</sup>1α<sup>2</sup> ... <sup>∈</sup> *Act*<sup>ω</sup>*.*

### **Definition 4 (Liveness Properties)**


A liveness property is generally characterized by the requirement that each finite trace prefix can be extended to an infinite trace that satisfies this property. In our setting, this means that in each state of a given program it is guaranteed that eventually a productive action will be performed. That is, infinitely many productive actions will occur during each execution. As ↓ is considered productive, terminating computations are live. This differs from the classical setting where terminal states are usually considered as deadlocks that violate liveness.

We assume that the target machine is deterministic w.r.t. its computations and model the consumption of input as the only source of non-determinism. This means that if the execution is in a state in which the program will execute a noninput instruction, only a single outgoing (unproductive) transition exists. If the program is to consume input on the other hand, a (productive) transition exists for every possible value of input. We only consider functions that provide at least one bit of input as input functions, which makes ↓ the only productive action that is also deterministic, that is, the only productive transition which must be taken once the state it originates from is reached. More formally, |*Out*(s)| > 1 ⇔ *Out*(s) ⊆ Π \ {↓}. Thus if a (sub-)execution siαi+1si+1 ... contains no productive transitions beyond ↓, it is fully specified by its first state si, as there will only ever be a single transition to be taken.

Similarly, we assume that the target machine has finite memory. This implies that the number of possible states is finite: <sup>|</sup>*S*| ∈ <sup>N</sup>. Although we model each possible input with its own transition, input words are finite too, therefore *Act* is finite and hence *Out*(s) for each s ∈ *S*.

# **4 Finding Lassos**

Any trace t that violates a liveness property must necessarily consist of a finite prefix p that leads to some state s ∈ *S*, after which no further productive transitions are taken. Therefore, t can be written as t = pq, where p is finite and may contain productive actions, while q is infinite and does not contain productive actions. Since *S* is a finite set and every state from s onward will only have a single outgoing transition and successor, q must contain a cycle that repeats itself infinitely often. Therefore, q in turn can be written as q = fc<sup>ω</sup> where f is finite and c non-empty. Due to its shape, we call this a *lasso* with pf the *stem* and c the *loop*.

Due to the infeasible computational complexity of checking our liveness property statically (in the absence of input functions, it becomes the finite-space halting problem), we leverage a dynamic analysis that is capable of finding any violation in bounded time and works incrementally to report violations as they are encountered. We do so by searching the state space for a lasso, whose loop does not contain any productive transitions. This is na¨ıvely achieved in the dynamic analysis by checking whether any other state visited since the last productive transition is equal to the current one. In this case the current state deterministically transitions to itself, i.e., is part of the loop.

To implement this idea without prohibitively expensive resource usage, two main challenges must be overcome: 1. Exhaustive exploration of all possible inputs is infeasible for nontrivial cases. 2. Comparing states requires up to 2<sup>64</sup> byte comparisons on a 64 bit computer. In the rest of this section, we discuss how to leverage SymEx to tackle the first problem (Sect. 4.1) and how to cheapen state comparisons with specially composed hash-based fingerprints (Sect. 4.2).

### **4.1 Symbolic Execution**

*Symbolic Execution (SymEx)* has become a popular dynamic analysis technique whose primary domain is automated test case generation and bug detection [10–12,15,22,41,42,49]. The primary intent behind SymEx is to improve upon exhaustive testing by symbolically constraining inputs instead of iterating over all possible values, which makes it a natural fit.

**Background.** The example in Fig. 1 tests whether the variable x is in the range from 5 to 99 by performing two tests before returning the result. As x is the

**Fig. 1.** SymEx tree showing the execution of a snippet with two ifs. The variable <sup>x</sup> is symbolic and one state is unreachable, as its Path Constraint is unsatisfiable.

input to this snippet, it is initially assigned an unconstrained symbolic value. Upon branching on x < 5 in line 2, the SymEx engine needs to consider two cases: One in which x is now constrained to be smaller than 5 and another one in which it is constrained to *not* be smaller than 5. On the path on which x < 5 held, ok is then assigned false, while the other path does not execute that instruction. Afterwards, both paths encounter the branch if(x > = 100) in line 4. Since the constraint set {x < 5, x ≥ 100} is unsatisfiable, the leftmost of the four resulting possibilities is unreachable and therefore not explored. The three remaining paths reach the return statement in line 6. We call the set of currently active constraints the *Path Constraint (PC)*. The PC is usually constructed in such a way, as to contain constraints in the combined theories of quantifier-free bit-vectors, finite arrays and floating point numbers<sup>2</sup>.

**Symbolic Execution of the Abstract Transition System.** By using symbolic values, a single SymEx state can represent a large number of states in the transition system. We require that the SymEx engine, as is commonly done, never assigns a symbolic value (with more than one satisfying model) to the instruction pointer. Since the productive transitions of the transition system are derived from instructions in the program code, this means that each instruction that the SymEx engine performs either corresponds to a number of productive, input-consuming transitions, or a number of unproductive, *not* input-consuming transitions. Therefore, any lasso in the SymEx of the program is also a lasso in the transition system (the ↓ transition requires trivial special treatment).

To ensure that the opposite is also true, a simple and common optimization must be implemented in the SymEx engine: Only add branch conditions to the PC that are not already implied by it. This is the case iff exactly one of the two branching possibilities is satisfiable, which the SymEx engine (or rather its SMT solver) needs to check in any case. Thereby it is guaranteed that if the SymEx state is part of a loop in the transition system, not just the concrete

<sup>2</sup> While current SymEx engines and SMT solvers still struggle with the floating point theory in practice [37], the SMT problem is decidable for this combination of theories. Bitblasting [20] gives a polynomial-time reduction to the boolean SAT problem.

values, but also the symbolic values will eventually converge towards a steady state. Again excluding trivial special treatment for program termination, a lasso in the transition system thus entails a lasso in the SymEx of the program.

### **4.2 Fingerprinting**

To reduce the cost of each individual comparison between two states, we take an idea from hash maps by computing a *fingerprint* ρ for each state and comparing those. A further significant improvement is possible by using a strong cryptographic hash algorithm to compute the fingerprint: Being able to rely (with very high probability) on the fingerprint comparison reduces the memory requirements, as it becomes unnecessary to store a list of full predecessor states. Instead, only the fingerprints of the predecessors need to be kept.

Recomputing the fingerprint after each instruction would still require a full scan over the whole state at each instruction however. Instead, we enable efficient, incremental computation of the fingerprint by not hashing everything, but rather hashing many small *fragments*, and then composing the resulting hashes using bitwise xor. Then, if an instruction attempts to modify a fragment f, it is easy to compute the old and new fragment hashes. The new fingerprint ρnew can then be computed as ρnew := ρold ⊕ *hash*(fold) ⊕ *hash*(fnew). Changing a single fragment therefore requires only two computations and bitwise xors on constant size bit strings—one to remove the old fragment from the composite and one to insert the new one. Each incremental fingerprint update only modifies a small number of fragments statically bounded by the types used in the program.

# **4.3 Algorithm Overview**

The proposed algorithm explores as much of the input state as is possible within a specified amount of time, using SymEx to cover large portions of the input space simultaneously. Every SymEx state is efficiently checked against all its predecessors by comparing their fingerprints.

# **5 Efficient Implementation of the Algorithm**

To develop the algorithm presented in the previous section into a practically useful program, we decided to build upon the KLEE SymEx engine [10], with which many safety bugs in real-world programs have been previously found [10,15,41]. As KLEE in turn builds upon the LLVM compiler infrastructure [36], this section begins with a short introduction to LLVM Intermediate Representation (IR) (Sect. 5.1), before explaining how the fragments whose hashes make up the fingerprint can be implemented (Sect. 5.2) and how to track fingerprints (Sect. 5.3). Finally, we detail a technique to avoid as many comparisons as possible (Sect. 5.4).

### **5.1 LLVM Intermediate Representation**

*LLVM Intermediate Representation (IR)* was designed as a typed, low-level language independent from both (high-level) source language and any specific target architecture, to facilitate compiler optimizations. It operates on an unlimited number of typed registers of arbitrary size, as well as addressable memory. Instructions in IR operate in Static Single Assignment (SSA) form, i.e., registers are only ever assigned once and never modified. The language also has functions, which have a return type and an arbitrary number of typed parameters. Apart from global scope, there is only function scope, but IR features no block scope.

Addressable objects are either global variables, or explicitly allocated, e.g., using malloc (cleaned up with free) or alloca (cleaned up on return from function).


**Fig. 2.** Six kinds of fragments suffice to denote all possible variants. Symbolic values are written as serialized symbolic expressions consisting of all relevant constraints. All other fields only ever contain concrete values, which are simply used verbatim. Fields of dynamic size are denoted by a ragged right edge.

### **5.2 Fragments**

When determining what is to become a fragment, i.e., an atomic portion of a fingerprint, two major design goals should be taken into consideration:

	- (a) The hashing algorithm should be chosen in a manner that makes collisions so unlikely, as to be non-existent in practice.
	- (b) The fragments themselves need to be generated in a way that ensures that no two different fragments have the same representation, as that would of course cause their hashes to be equal as well.

**Avoiding Collisions.** In order to minimize the risk of accidental collisions, which would reduce the efficacy of our methodology, we chose the cryptographically secure checksum algorithm BLAKE2b [4] to generate 256 bit hashes, providing 128 bit collision resistance. To the best of our knowledge, there are currently

**Fig. 3.** Incremental computation of a new fingerprint. Fingerprints are stored in a call stack, with each stack frame containing a partial fingerprint of all addressable memory allocated locally in that function, another partial fingerprint of all registers used in the function and a list of previously encountered fingerprints. A partial fingerprint of all dynamic and global variables is stored independently.

no relevant structural attacks on BLAKE2b, which allows us to assume that the collision resistance is given. For comparison: The revision control system GIT currently uses 160 bit SHA-1 hashes to create unique identifiers for its objects, with plans underway to migrate to a stronger 256 bit hash algorithm<sup>3</sup>.

To ensure that the fragments themselves are generated in a collision-free manner, we structure them with three fields each, as can be seen in Fig. 2. The first field contains a tag that lets us distinguish between different types of fragments, the middle field contains an address appropriate for that type, and the last field is the value that the fragment represents. We distinguish between three different address spaces: 1. main memory, 2. LLVM registers, which similarly to actual processors hold values that do not have a main memory address, and 3. function arguments, which behave similarly to ordinary LLVM registers, but require a certain amount of special handling in our implementation. For example, the fragment (0x01, 0xFF3780, 0xFF) means that the memory address 0xFF3780 holds the concrete byte 0xFF. This fragment hashes to ea58...f677.

If the fragment represents a concrete value, its size is statically bounded by the kind of write being done. For example, a write to main memory requires 1 byte + 8 byte + 1 byte = 10 byte and modifying a 64 bit register requires 1 byte + 8 byte + 64 bit 8 bit/byte = 17 byte. In the case of fragments representing symbolic values on the other hand, such a guarantee cannot effectively be made, as the symbolic expression may become arbitrarily large. Consider, for example, a symbolic expression of the form λ = input<sup>1</sup> + input<sup>2</sup> + ... + inputn, whose result is directly influenced by an arbitrary amount of n input words.

In summary, fragments are created in a way that precludes structural weaknesses as long as the hash algorithm used (in our case 256 bit BLAKE2b) remains unbroken and collisions are significantly less probable than transient failures of the computer performing the analysis.

<sup>3</sup> https://www.kernel.org/pub/software/scm/git/docs/technical/hash-function-trans ition.html (Retrieved Jan. 2018).

### **5.3 Fingerprint Tracking**

When using the KLEE SymEx engine, the call stack is not explicitly mapped into the program's address space, but rather directly managed by KLEE itself. This enables us to further extend the practical usefulness of our analysis by only considering fragments that are directly addressable from each point of the execution, which in turn enables the detection of certain non-terminating recursive function calls. It also goes well together with the implicit cleanup of all function variables when a function returns to its caller.

To incrementally construct the current fingerprint we utilize a stack that follows the current call stack, as is shown exemplary in Fig. 3. Each entry consists of three different parts: 1. A (partial) fingerprint over all local registers, i.e., objects that are not globally addressable, 2. A (partial) fingerprint over all locally allocated objects in main memory and 3. A list of pairs of instruction IDs and fingerprints, that denote the states that were encountered previously.

**Modifying Objects.** Any instruction modifying an object without reading input, such as an addition, is dealt with as explained previously: First, recompute the hash of the old fragment(s) before the instruction is performed and remove it from the current fingerprint. Then, perform the instruction, compute the hash of the new fragment(s) and add it to the current fingerprint.

Similarly modify the appropriate partial fingerprint, e.g., for a load the fingerprint of all local registers of the current function. Note that this requires each memory object to be mappable to where it was allocated from.

**Function Calls.** To perform a function call, push a new entry onto the stack with the register fingerprint initialized to the xor of the hashes of the argument fragments and the main memory fingerprint set to the neutral element, zero. Update the current fingerprint by removing the caller's register fingerprint and adding the callee's register fingerprint. Add the pair of entry point and current fingerprint to the list of previously seen fingerprints.

**Function Returns.** When returning from a function, first remove both the fingerprint of the local registers, as well as the fingerprint of local, globally addressable objects from the current fingerprint, as all of these will be implicitly destroyed by the returning function. Then pop the topmost entry from the stack and re-enable the fingerprint of the local registers of the caller.

**Reading Input.** Upon reading input all previously encountered fingerprints must be disregarded by clearing all fingerprint lists of the current SymEx state.

### **5.4 Avoiding Comparisons**

While it would be sufficient to simply check all previous fingerprints for a repetition every time the current fingerprint is modified, it would be rather inefficient to do so. To gain as much performance as possible, our implementation attempts to perform as few comparisons as possible.

We reduce the number of fingerprints that need to be considered at any point by exploiting the structure of the call stack: To find any non-recursive infinite loop, it suffices to search the list of the current stack frame, while recursive infinite loops can be identified using only the first fingerprint of each stack frame.

We also exploit static control flow information by only storing and testing fingerprints for Basic Blocks (BBs), which are sequences of instructions with linear control flow<sup>4</sup>. If any one instruction of a BB is executed infinitely often, all of them are. Thus, a BB is either fully in the infinite cycle, or no part of it is.

It is not even necessary to consider every single BB, as we are looking for a trace with a finite prefix leading into a cycle. As the abstract transition system is an unfolding of the CFG, any cycle in the transition system must unfold from a cycle in the CFG. Any reachable cycle in the CFG must contain a BB with more than one predecessor, as at least one BB must be reachable from both outside and inside the cycle. Therefore, it is sufficient to only check BBs with multiple predecessors. As IR only provides intraprocedural CFGs, we additionally perform a check for infinite recursion at the beginning of each function.

# **6 Evaluation**

In this section we demonstrate the effectiveness and performance of our approach on well tested and widely used real-world software. We focus on three different groups of programs: 1. The GNU Coreutils and GNU sed (Sect. 6.1), 2. BusyBox (Sect. 6.2) and 3. Toybox (Sect. 6.3) and evaluate the performance of our liveness analysis in comparison with baseline KLEE in the following metrics: 1. instructions per second and 2. peak resident set size. Additionally, we analyze the impact of the time limit on the overhead (Sect. 6.4). We summarize our findings in Sect. 6.5.

**Setup.** We used revision aa01f83<sup>5</sup> of our software, which is based on KLEE revision 37f554d<sup>6</sup>. Both versions are invoked as suggested by the KLEE authors and maintainers [10,47] in order to maximize reproducability and ensure realistic results. However, we choose the Z3 [39] solver over STP [20] as the former provides a native timeout feature, enabling more reliable measurements. The solver timeout is 30 s and the memory limit is 10 000 MiB.

We run each configuration 20 times in order to gain statistical confidence in the results. From every single run, we extract both the instructions, allowing us to compute the instructions per second, and the peak resident set size of the process, i.e., the maximal amount of memory used. We additionally reproduced the detected liveness violations with 30 runs each with a time limit of 24 h, recording the total time required for our implementation to find the first violation. For all results we give a 99% confidence interval.

<sup>4</sup> In IR there is an exemption for function calls, namely they do not break up BBs.

<sup>5</sup> https://github.com/COMSYS/SymbolicLivenessAnalysis/tree/aa01f83.

<sup>6</sup> https://github.com/klee/klee/tree/37f554d.

### **6.1 GNU Utilities**

We combine the GNU tools from the Coreutils 8.25 [45] with GNU sed 4.4 [46], as the other tool suites also contain an implementation of the sed utility. We excluded 4 tools from the experiment as their execution is not captured by KLEE's system model. Thereby, the experiment contains a total of 103 tools.

**Violations.** The expected liveness violation in yes occurred after 2.51 s ± 0.26 s. In 26 out of 30 runs, we were also able to detect a violation in GNU sed after a mean computation time of 8.06 h ± 3.21 h (KLEE's timeout was set to 24 h). With the symbolic arguments restricted to one argument of four symbolic characters, reproduction completed in all 30 runs with a mean of 5.19 min ± 0.17 min.

**Fig. 4.** GNU Coreutils and GNU sed, 60 min time limit. Relative change of instructions per second (top) and peak resident set (bottom) versus the KLEE baseline. Note the logarithmic scale and the black 99% confidence intervals.

**Fig. 5.** BusyBox, 60 min time limit. Relative change of instructions per second (top) and peak resident set (bottom) versus the KLEE baseline. Note the logarithmic scale and the black 99% confidence intervals.

We detected multiple violations in tail stemming from two previously unknown bugs, that we reported. Both bugs were originally detected and reported in version 8.25<sup>7</sup> and fixed in version 8.26. Both bugs were in the codebase for over 16 years. Reproducing the detection was successful in 30 of 30 attempts with a mean time of 1.59 h ± 0.66 h until the first detected violation.

We detected another previously unknown bug in ptx. Although we originally identified the bug in version 8.27, we reported it after the release of 8.28<sup>8</sup>, leading

<sup>7</sup> GNU tail report 1: http://bugs.gnu.org/24495.

GNU tail report 2: http://bugs.gnu.org/24903.

<sup>8</sup> GNU ptx report: http://bugs.gnu.org/28417.

to a fix in version 8.29. This bug is not easily detected: Only 9 of 30 runs completed within the time limit of 24 h. For these, mean time to first detection was 17.15 h ± 3.74 h.

**Performance.** Figure 4 shows the relative changes in instructions per second and peak resident set. As can be seen, performance is only reduced slightly below the KLEE baseline and the memory overhead is even less significant. The leftmost tool, make-prime-list, shows the by far most significant change from the KLEE baseline. This is because make-prime-list only reads very little input, followed by a very complex computation in the course of which no further input is read.

# **6.2 BusyBox**

For this experiment we used BusyBox version 1.27.2 [44]. As BusyBox contains a large number of network tools and daemons, we had to exclude 232 tools from the evaluation, leaving us with 151 tools.

**Violations.** Compared with Coreutils' yes, detecting the expected liveness violation in the BusyBox implementation of yes took comparatively long with 27.68 s ± 0.33 s. We were unable to detect any violations in BusyBox sed without restricting the size of the symbolic arguments. When restricting them to one argument with four symbolic characters, we found the first violation in all 30 runs within 1.44 h ± 0.08 h. Our evaluation uncovered two previously unknown bugs in BusyBox hush<sup>9</sup>. We first detected both bugs in version 1.27.2. In all 30 runs, a violation was detected after 71.73 s ± 5.00 s.

**Performance.** As shown in Fig. 5, BusyBox has a higher slowdown on average than the GNU Coreutils (c.f. Fig. 4). Several tools show a *decrease* in memory consumption that we attribute to the drop in retired instructions. yes shows the least throughput, as baseline KLEE very efficiently evaluates the infinite loop.

**Fig. 6.** Toybox, 60 min time limit. Relative change of instructions per second (top) and peak resident set (bottom) versus the KLEE baseline. Note the logarithmic scale and the black 99% confidence intervals.

<sup>9</sup> BusyBox hush report 1: https://bugs.busybox.net/10421. BusyBox hush report 2: https://bugs.busybox.net/10686.

### **6.3 Toybox**

The third and final experiment with real-world software consists of 100 tools from toybox 0.7.5 [48]. We excluded 76 of the total of 176 tools, which rely on operating system features not reasonably modeled by KLEE.

**Violations.** For yes we encounter the first violation after 6.34 s ± 0.24 s, which puts it in between the times for GNU yes and BusyBox yes. This violation is also triggered from env by way of toybox's internal path lookup. As with the other sed implementations, toybox sed often fails to complete when run with the default parameter set. With only one symbolic argument of four symbolic characters, however, we encountered a violation in all 30 runs within 4.99 min ± 0.25 min.

**Performance.** Overall as well, our approach shows a performance for toybox in between those for the GNU Coreutils and BusyBox, as can be seen in Fig. 6. Both memory and velocity overhead are limited. For most toybox tools, the overhead is small enough to warrant always enabling our changes when running KLEE.

**Fig. 7.** Changes in instructions per second, peak resident set and branch coverage over multiple KLEE timeouts. Note the logarithmic scale and the black 99% confidence intervals.

**Fig. 8.** Heap usage of a 30 min BusyBox hush run. The 186 vertical lines show detected liveness violations.

### **6.4 Scaling with the Time Limit**

To ascertain whether the performance penalty incurred by our implementation scales with the KLEE time limit, we have repeated each experiment with time limits 15 min, 30 min and 60 min. The results shown in Fig. 7 indicate that, at least at this scale, baseline KLEE and our implementation scale equally well. This is true for almost all relevant metrics: retired instructions per second, peak resident set and covered branches. The prominent exception is BusyBox's memory usage, which is shown exemplary in Fig. 8 for a 30 min run of BusyBox hush. As can be seen, the overhead introduced by the liveness analysis is mostly stable at about a quarter of the total heap usage.

### **6.5 Summary**

All evaluated tool suites show a low average performance and memory penalty when comparing our approach to baseline KLEE. While the slowdown is significant for some tools in each suite, it is consistent as long as time and memory limits are not chosen too tightly. In fact, for these kinds of programs, it is reasonable to accept a limited slowdown in exchange for opening up a whole new category of defects that can be detected. In direct comparison, performance varies in between suites, but remains reasonable in each case.

# **7 Limitations**

Our approach does not distinguish between interpreters and interpreted programs. While this enables the automatic derivation of input programs for such interpreters as sed, it also makes it hard to recognize meaningful error cases. This causes the analysis of all three implementations of sed used in the evaluation (Sect. 6) to return liveness violations.

In its current form, our implementation struggles with runaway counters, as a 64 bit counter cannot be practically enumerated on current hardware. Combining static analyses, such as those done by optimizing compilers may significantly reduce the impact of this problem in the future.

A different pattern that may confound our implementation is related to repeated allocations. If memory is requested again after releasing it, the newly acquired memory may not be at the same position, which causes any pointers to it to have different values. While this is fully correct, it may cause the implementation to not recognize cycles in a reasonable time frame. This could be mitigated by analyzing whether the value of the pointer ever actually matters. For example, in the C programming language, it is fairly uncommon to inspect the numerical value of a pointer beyond comparing it to NULL or other pointers. A valid solution would however require strengthening KLEE's memory model, which currently does not model pointer inspection very well.

Another potential problem is how the PC is serialized when using symbolic expressions as the value of a fragment (c.f. Sect. 5.2). We currently reuse KLEE's serialization routines, which are not exactly tuned for performance. Also, each symbolic value that is generated by KLEE is assigned a unique name, that is then displayed by the serialization, which discounts potential equivalence.

Finally, by building upon SymEx, we inherit not only its strengths, but also its weaknesses, such as a certain predilection for state explosion and a reliance on repeated SMT solving [12]. Also, actual SymEx implementations are limited further than that. For example, KLEE returns a concrete pointer from allocation routines instead of a symbolic value representing all possible addresses.

### **8 Conclusion and Outlook**

It is our strong belief that the testing and verification of liveness properties needs to become more attractive to developers of real-world programs. Our work provides a step in that direction with the formulation of a liveness property that is general and practically useful, thereby enabling even developers uncomfortable with interacting with formal testing and verification methods to at least check their software for liveness violation bugs.

We demonstrated the usefulness of our liveness property by implementing it as an extension to the Symbolic Execution engine KLEE, thereby enabling it to discover a class of software defects it could not previously detect, and analyzing several large and well-tested programs. Our implementation caused the discovery and eventual correction of a total of five previously unknown defects, three in the GNU Coreutils, arguably one of the most well-tested code bases in existence, and two in BusyBox. Each of these bugs had been in released software for over 7 years—four of them even for over 16 years, which goes to show that this class of bugs has so far proven elusive. Our implementation did not cause a single false positive: all reported violations are indeed accompanied by concrete test cases that reproduce a violation of our liveness property.

The evaluation in Sect. 6 also showed that the performance impact, in matters of throughput as well as in matters of memory consumption, remains significantly below 2× on average, while allowing the analysis to detect a completely new range of software defects. We demonstrated that this overhead remains stable over a range of different analysis durations.

In future work, we will explore the opportunities for same-state merging that our approach enables by implementing efficient equality testing of SymEx states via our fingerprinting scheme. We expect that this will further improve the performance of our approach and maybe even exceed KLEE's baseline performance by reducing the amount of duplicate work done.

**Acknowledgements.** This research is supported by the European Research Council (ERC) under the European Union's Horizon 2020 Research and Innovation Programme (grant agreement №. 647295 (SYMBIOSYS)).

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Model Checking Boot Code from AWS Data Centers**

Byron Cook1,2, Kareem Khazem1,2, Daniel Kroening<sup>3</sup>, Serdar Tasiran<sup>1</sup>, Michael Tautschnig1,4(B) , and Mark R. Tuttle<sup>1</sup>

> Amazon Web Services, Seattle, USA tautschn@amazon.com University College London, London, UK University of Oxford, Oxford, UK Queen Mary University of London, London, UK

**Abstract.** This paper describes our experience with symbolic model checking in an industrial setting. We have proved that the initial boot code running in data centers at Amazon Web Services is memory safe, an essential step in establishing the security of any data center. Standard static analysis tools cannot be easily used on boot code without modification owing to issues not commonly found in higher-level code, including memory-mapped device interfaces, byte-level memory access, and linker scripts. This paper describes automated solutions to these issues and their implementation in the C Bounded Model Checker (CBMC). CBMC is now the first source-level static analysis tool to extract the memory layout described in a linker script for use in its analysis.

# **1 Introduction**

Boot code is the first code to run in a data center; thus, the security of a data center depends on the security of the boot code. It is hard to demonstrate boot code security using standard techniques, as boot code is difficult to test and debug, and boot code must run without the support of common security mitigations available to the operating system and user applications. This industrial experience report describes work to prove the memory safety of initial boot code running in data centers at Amazon Web Services (AWS).

We describe the challenges we faced analyzing AWS boot code, some of which render existing approaches to software verification unsound or imprecise. These challenges include


Not handling MMIO or linker scripts results in imprecision (false positives), and not modeling device behavior is unsound (false negatives).

We describe the solutions to these challenges that we developed. We implemented our solutions in the C Bounded Model Checker (CBMC) [20]. We achieve soundness with CBMC by fully unrolling loops in the boot code. Our solutions automate boot code verification and require no changes to the code being analyzed. This makes our work particularly well-suited for deployment in a continuous validation environment to ensure that memory safety issues do not reappear in the code as it evolves during development. We use CBMC, but any other bit-precise, sound, automated static analysis tool could be used.

# **2 Related Work**

There are many approaches to finding memory safety errors in low-level code, from fuzzing [2] to static analysis [24,30,39,52] to deductive verification [21,34].

A key aspect of our work is soundness and precision in the presence of very low-level details. Furthermore, full automation is essential in our setting to operate in a continuous validation environment. This makes some form of model checking most appealing.

CBMC is a bounded model checker for C, C++, and Java programs, available on GitHub [13]. It features bit-precise reasoning, and it verifies array bounds (buffer overflows), pointer safety, arithmetic exceptions, and assertions in the code. A user can bound the model checking done by CBMC by specifying for a loop a maximum number of iterations of the loop. CBMC can check that it is impossible for the loop to iterate more than the specified number of times by checking a *loop-unwinding assertion*. CBMC is sound when all loop-unwinding assertions hold. Loops in boot code typically iterate over arrays of known sizes, making it possible to choose loop unwinding limits such that all loop-unwinding assertions hold (see Sect. 5.7). BLITZ [16] or F-Soft [36] could be used in place of CBMC. SATABS [19], Ufo [3], Cascade [55], Blast [9], CPAchecker [10], Corral [33,43,44], and others [18,47] might even enable unbounded verification. Our work applies to any sound, bit-precise, automated tool.

Note that boot code makes heavy use of pointers, bit vectors, and arrays, but not the heap. Thus, memory safety proof techniques based on three-valued logic [45] or separation logic as in [8] or other techniques [1,22] that focus on the heap are less appropriate since boot code mostly uses simple arrays.

KLEE [12] is a symbolic execution engine for C that has been used to find bugs in firmware. Davidson et al. [25] built the tool FIE on top of KLEE for detecting bugs in firmware programs for the MSP430 family of microcontrollers for low-power platforms, and applied the tool to nearly a hundred open source firmware programs for nearly a dozen versions of the microcontroller to find bugs like buffer overflow and writing to read-only memory. Corin and Manzano [23] used KLEE to do taint analysis and prove confidentiality and integrity properties. KLEE and other tools like SMACK [49] based on the LLVM intermediate representation do not currently support the linker scripts that are a crucial part of building boot code (see Sect. 4.5). They support partial linking by concatenating object files and resolving symbols, but fail to make available to their analysis the addresses and constants assigned to symbols in linker scripts, resulting in an imprecise analysis of the code.

S<sup>2</sup>E [15] is a symbolic execution engine for x86 binaries built on top of the QEMU [7] virtual machine and KLEE. S<sup>2</sup>E has been used on firmware. Parvez et al. [48] use symbolic execution to generate inputs targeting a potentially buggy statement for debugging. Kuznetsov et al. [42] used a prototype of S<sup>2</sup>E to find bugs in Microsoft device drivers. Zaddach et al. [56] built the tool Avatar on top of S<sup>2</sup>E to check security of embedded firmware. They test firmware running on top of actual hardware, moving device state between the concrete device and the symbolic execution. Bazhaniuk et al. [6,28] used S<sup>2</sup>E to search for security vulnerabilities in interrupt handlers for System Management Mode on Intel platforms. Experts can use S<sup>2</sup>E on firmware. One can model device behavior (see Sect. 4.2) by adding a device model to QEMU or using the signaling mechanism used by S<sup>2</sup>E during symbolic execution. One can declare an MMIO region (see Sect. 4.1) by inserting it into the QEMU memory hierarchy. Both require understanding either QEMU or S<sup>2</sup>E implementations. Our goal is to make it as easy as possible to use our work, primarily by way of automation.

Ferreira et al. [29] verify a task scheduler for an operating system, but that is high in the software stack. Klein et al. [38] prove the correctness of the seL4 kernel, but that code was written with the goal of proof. Dillig et al. [26] synthesize guards ensuring memory safety in low-level code, but our code is written by hand. Rakamari´c and Hu [50] developed a conservative, scalable approach to memory safety in low-level code, but the models there are not tailored to our code that routinely accesses memory by an explicit integer-valued memory address. Redini et al. [51] built a tool called BootStomp on top of angr [54], a framework for symbolic execution of binaries based on a symbolic execution engine for the VEX intermediate representation for the Valgrind project, resulting in a powerful testing tool for boot code, but it is not sound.

# **3 Boot Code**

We define *boot code* to be the code in a cloud data center that runs from the moment the power is turned on until the BIOS starts. It runs before the operating system's boot loader that most people are familiar with. A key component to ensuring high confidence in data center security is establishing confidence in boot code security. Enhancing confidence in boot code security is a challenge because of unique properties of boot code not found in higher-level software. We now discuss these properties of boot code, and a path to greater confidence in boot code security.

### **3.1 Boot Code Implementation**

Boot code starts a sequenced boot flow [4] in which each stage locates, loads, and launches the next stage. The boot flow in a modern data center proceeds as follows: (1) When the power is turned on, before a single instruction is executed, the hardware interrogates banks of fuses and hardware registers for configuration information that is distributed to various parts of the platform. (2) *Boot code* starts up to boot a set of microcontrollers that orchestrate bringing up the rest of the platform. In a cloud data center, some of these microcontrollers are feature-rich cores with their own devices used to support virtualization. (3) The BIOS familiar to most people starts up to boot the cores and their devices. (4) A boot loader for the hypervisor launches the hypervisor to virtualize those cores. (5) A boot loader for the operating system launches the operating system itself. The security of each stage, including operating system launched for the customer, depends on the integrity of all prior stages [27].

Ensuring boot code security using traditional techniques is hard. Visibility into code execution can only be achieved via debug ports, with almost no ability to single-step the code for debugging. UEFI (Unified Extensible Firmware Interface) [53] provides an elaborate infrastructure for debugging BIOS, but not for the boot code below BIOS in the software stack. Instrumenting boot code may be impossible because it can break the build process: the increased size of instrumented code can be larger than the size of the ROM targeted by the build process. Extracting the data collected by instrumentation may be difficult because the code has no access to a file system to record the data, and memory available for storing the data may be limited.

Static analysis is a relatively new approach to enhancing confidence in boot code security. As discussed in Sect. 2, most work applying static analysis to boot code applies technology like symbolic execution to binary code, either because the work strips the boot code from ROMs on shipping products for analysis and reverse engineering [42,51], or because code like UEFI-based implementations of BIOS loads modules with a form of dynamic linking that makes source code analysis of any significant functionality impossible [6,28]. But with access to the source code—source code without the complexity of dynamic linking meaningful static analysis at the source code level is possible.

### **3.2 Boot Code Security**

Boot code is a foundational component of data center security: it controls what code is run on the server. Attacking boot code is a path to booting your own code, installing a persistent root kit, or making the server unbootable. Boot code also initializes devices and interfaces directly with them. Attacking boot code can also lead to controlling or monitoring peripherals like storage devices.

The input to boot code is primarily configuration information. The runtime behavior of boot code is determined by configuration information in fuses, hardware straps, one-time programmable memories, and ROMs.

From a security perspective, boot code is susceptible to a variety of events that could set the configuration to an undesirable state. To keep any malicious adversary from modifying this configuration information, the configuration is usually locked or otherwise write-protected. Nonetheless, it is routine to discover during hardware vetting before placing hardware on a data center floor that some BIOS added by a supplier accidentally leaves a configuration register unlocked after setting it. In fact, configuration information can be intentionally unlocked for the purpose of patching and then be locked again. Any bug in a patch or in a patching mechanism has the potential to leave a server in a vulnerable configuration. Perhaps more likely than anything is a simple configuration mistake at installation. We want to know that no matter how a configuration may have been corrupted, the boot code will operate as intended and without latent exposures for potential adversaries.

The attack surface we focus on in this paper is memory safety, meaning there are no buffer overflows, no dereferencing of null pointers, and no pointers pointing into unallocated regions of memory. Code written in C is known to be at risk for memory safety, and boot code is almost always written in C, in part because of the direct connection between boot code and the hardware, and sometimes because of space limitations in the ROMs used to store the code.

There are many techniques for protecting against memory safety errors and mitigating their consequences at the higher levels of the software stack. Languages other than C are less prone to memory safety errors. Safe libraries can do bounds checking for standard library functions. Compiler extensions to compilers like gcc and clang can help detect buffer overflow when it happens (which is different from keeping it from happening). Address space layout randomization makes it harder for the adversary to make reliable use of a vulnerability. None of these mitigations, however, apply to firmware. Firmware is typically built using the tool chain that is provided by the manufacturer of the microcontroller, and firmware typically runs before the operating system starts, without the benefit of operating system support like a virtual machine or randomized memory layout.

### **4 Boot Code Verification Challenges**

Boot code poses challenges to the precision, soundness, and performance of any analysis tool. The C standard [35] says, "A volatile declaration may be used to describe an object corresponding to an MMIO port" and "what constitutes an access to an object that has volatile-qualified type is implementation-defined." Any tool that seeks to verify boot code must provide means to model what the C standard calls *implementation-defined behavior*. Of all such behavior, MMIO and device behavior are most relevant to boot code. In this section, we discuss these issues and the solutions we have implemented in CBMC.

### **4.1 Memory-Mapped I/O**

Boot code accesses a device through *memory-mapped input/output* (MMIO). Registers of the device are mapped to specific locations in memory. Boot code reads or writes a register in the device by reading or writing a specific location in memory. If boot code wants to set the second bit in a configuration register, and if that configuration register is mapped to the byte at location 0x1000 in memory, then the boot code sets the second bit of the byte at 0x1000. The problem posed by MMIO is that there is no declaration or allocation in the source code specifying this location 0x1000 as a valid region of memory. Nevertheless accesses within this region are valid memory accesses, and should not be flagged as an out-of-bounds memory reference. This is an example of implementation-defined behavior that must be modeled to avoid reporting false positives.

To facilitate analysis of low-level code, we have added to CBMC a built-in function

```
__CPROVER_allocated_memory( address , size)
```
to mark ranges of memory as valid. Accesses within this region are exempt from the out-of-bounds assertion checking that CBMC would normally do. The function declares the half-open interval [address*,* address+size) as valid memory that can be read and written. This function can be used anywhere in the source code, but is most commonly used in the test harness. (CBMC, like most program analysis approaches, uses a test harness to drive the analysis.)

# **4.2 Device Behavior**

An MMIO region is an interface to a device. It is unsound to assume that the values returned by reading and writing this region of memory follow the semantics of ordinary read-write memory. Imagine a device that can generate unique ids. If the register returning the unique id is mapped to the byte at location 0x1000, then reading location 0x1000 will return a different value every time, even without intervening writes. These side effects have to be modeled. One easy approach is to 'havoc' the device, meaning that writes are ignored and reads return nondeterministic values. This is sound, but may lead to too many false positives. We can model the device semantics more precisely, using one of the options described below.

If the device has an API, we havoc the device by making use of a more general functionality we have added to CBMC. We have added a command-line option

```
```
to CBMC's goto-instrument tool. When used, this will drop the implementation of the function device access from compiled object code. If there is no other definition of device access, CBMC will model each invocation of device access as returning an unconstrained value of the appropriate return type. Now, to havoc a device with an API that includes a read and write method, we can use this command-line option to remove their function bodies, and CBMC will model each invocation of read as returning an unconstrained value.

At link time, if another object file, such as the test harness, provides a second definition of device access, CBMC will use this definition in its place. Thus, to model device semantics more precisely, we can provide a device model in the test harness by providing implementations of (or approximations for) the methods in the API.

If the device has no API, meaning that the code refers directly to the address in the MMIO region for the device without reference to accessor functions, we have another method. We have added two function symbols

\_\_CPROVER\_mm\_io\_r( address , size) \_\_CPROVER\_mm\_io\_w( address , size , value) to CBMC to model the reading or writing of an address at a fixed integer address. If the test harness provides implementations of these functions, CBMC will use these functions to model every read or write of memory. For example, defining

```
char __CPROVER_mm_io_r( void *a, unsigned s) {
  if(a == 0x1000) return 2;
}
```
will return the value 2 upon any access at address 0x1000, and return a nondeterministic value in all other cases.

In both cases—with or without an API—we can thus establish sound and, if needed, precise analysis about an aspect of implementation-defined behavior.

### **4.3 Byte-Level Memory Access**

It is common for boot code to access memory a byte at a time, and to access a byte that is not part of any variable or data structure declared in the program text. Accessing a byte in an MMIO region is the most common example. Boot code typically accesses this byte in memory by computing the address of the byte as an integer value, coercing this integer to a pointer, and dereferencing this pointer to access that byte. Boot code references memory by this kind of explicit address far more frequently than it references memory via some explicitly allocated variable or data structure. Any tool analyzing boot code must have a method for reasoning efficiently about accessing an arbitrary byte of memory.

The natural model for memory is as an array of bytes, and CBMC does the same. Any decision procedure that has a well-engineered implementation of a theory of arrays is likely to do a good job of modeling byte-level memory access. We improved CBMC's decision procedure for arrays to follow the stateof-the-art algorithm [17,40]. The key data structure is a weak equivalence graph whose vertices correspond to array terms. Given an equality *a* = *b* between two array terms *a* and *b*, add an unlabeled edge between *a* and *b*. Given an update *a*{*i* ← *v*} of an array term *a*, add an edge labeled *i* between *a* and *a*{*i* ← *v*}. Two array terms *a* and *b* are weakly equivalent if there is a path from *a* to *b* in the graph, and they are equal at all indices except those updated along the path. This graph is used to encode constraints on array terms for the solver. For simplicity, our implementation generates these constraints eagerly.

#### **4.4 Memory Copying**

One of the main jobs of any stage of the boot flow is to copy the next stage into memory, usually using some variant of memcpy. Any tool analyzing boot code must have an efficient model of memcpy. Modeling memcpy as a loop iterating through a thousand bytes of memory leads to performance problems during program analysis. We added to CBMC an improved model of the memset and memcpy library functions.

Boot code has no access to a C library. In our case, the boot code shipped an iterative implementation of memset and memcpy. CBMC's model of the C library previously also used an iterative model. We replaced this iterative model of memset and memcpy with a single array operation that can be handled efficiently by the decision procedure at the back end. We instructed CBMC to replace the boot code implementations with the CBMC model using the --remove-function-body command-line option described in Sect. 4.2.

### **4.5 Linker Scripts**

*Linking* is the final stage in the process of transforming source code into an executable program. Compilation transforms source files into object files, which consist of several *sections* of related object code. A typical object file contains sections for executable code, read-only and read-write program data, debugging symbols, and other information. The linker combines several object files into a single executable object file, merging similar sections from each of the input files into single sections in the output executable. The linker combines and arranges the sections according to the directives in a *linker script*. Linker scripts are written in a declarative language [14].

The functionality of most programs is not sensitive to the exact layout of the executable file; therefore, by default, the linker uses a generic linker script<sup>1</sup> the directives of which are suited to laying out high-level programs. On the other hand, low-level code (like boot loaders, kernels, and firmware) must often be hard-coded to address particular memory locations, which necessitates the use of a custom linker script.

One use for a linker script is to place selected code into a specialized memory region like a *tightly-coupled memory* unit [5], which is a fast cache into which developers can place hot code. Another is device access via memory-mapped I/O as discussed in Sects. 4.1 and 4.2. Low-level programs address these hard devices by having a variable whose address in memory corresponds to the address that the hardware exposes. However, no programming language offers the ability to set a variable's address from the program; the variable must instead be laid out at the right place in the object file, using linker script directives.

While linker scripts are essential to implement the functionality of low-level code, their use in higher-level programs is uncommon. Thus, we know of no work that considers the role of linker scripts in static program analysis; a recent formal treatment of linkers [37] explicitly skips linker scripts. Ensuring that static analysis results remain correct in the presence of linker scripts is vital to verifying and finding bugs in low-level code; we next describe problems that linker scripts can create for static analyses.

**Linker Script Challenges.** All variables used in C programs must be *defined* exactly once. Static analyses make use of the values of these variables to decide program correctness, provided that the source code of the program and libraries used is available. However, linker scripts also define symbols that can be accessed as variables from C source code. Since C code never defines these symbols, and

<sup>1</sup> On Linux and macOS, running ld --verbose displays the default linker script.

linker scripts are not written in C, the values of these symbols are unknown to a static analyzer that is oblivious to linker scripts. If the correctness of code depends on the values of these symbols, it cannot be verified. To make this discussion concrete, consider the code in Fig. 1.

```
/* main.c */
#include <string.h>
extern char text_start;
extern char text_size;
extern char scratch_start;
int main() {
  memcpy(&text_start ,
         &scratch_start ,
         (size_t)&text_size);
}
                                     /* link.ld */
                                     SECTIONS {
                                       .text : {
                                         text_start=.;
                                         *(.text)
                                       }
                                       text_size=SIZEOF(.text);
                                       .scratch : {
                                         scratch_start=.;
                                         .=.+0x1000;
                                         scratch_end=.;
                                       }
                                     }
```
**Fig. 1.** A C program using variables whose addresses are defined in a linker script.

This example, adapted from the GNU linker manual [14], shows the common pattern of copying an entire region of program code from one part of memory to another. The linker writes an executable file in accordance with the linker script on the right; the expression "." (period) indicates the current byte offset into the executable file. The script directs the linker to generate a code section called .text and write the contents of the .text sections from each input file into that section; and to create an empty 4 KiB long section called .scratch. The symbols text\_start and scratch\_start are created at the address of the beginning of the associated section. Similarly, the symbol text\_size is created at the address equal to the code size of the .text section. Since these symbols are defined in the linker script, they can be freely used from the C program on the left (which must declare the symbols as extern, but not define them). While the data at the symbols' locations is likely garbage, the symbols' *addresses* are meaningful; in the program, the addresses are used to copy data from one section to another.

Contemporary static analysis tools fail to correctly model the behavior of this program because they model symbols defined in C code but not in linker scripts. Tools like SeaHorn [32] and KLEE [12] do support linking of the intermediate representation (IR) compiled from each of the source files with an IR linker. By using build wrappers like wllvm [46], they can even invoke the native system linker, which itself runs the linker script on the machine code sections of the object files. The actions of the native linker, however, are not propagated back to the IR linker, so the linked IR used for static analysis contains only information derived from C source, and not from linker scripts. As a result, these analyzers lack the required precision to prove that a safe program is safe: they generate false positives because they have no way of knowing (for example) that a memcpy is walking over a valid region of memory defined in the linker script.

**Information Required for Precise Modeling.** As we noted earlier in this section, linker scripts provide definitions to variables that may only be declared in C code, and whose addresses may be used in the program. In addition, linker scripts define the layout of code sections; the C program may copy data to and from these sections using variables defined in the linker script to demarcate valid regions inside the sections. Our aim is to allow the static analyzer to decide the memory safety of operations that use linker script definitions (if indeed they are safe, i.e., don't access memory regions outside those defined in the linker script). To do this, the analyzer must know (referencing our example in Fig. 1 but without loss of generality):


Fact 1 is derived from the source code; Fact 2—from parsing the linker script; and Fact 3—from disassembling the fully-linked executable, which will have had the sections and symbols laid out at their final addresses by the linker.

**Extending CBMC.** CBMC compiles source files with a front-end that emulates the native compiler (gcc), but which adds an additional section to the end of the output binary [41]; this section contains the program encoded in CBMC's analysis-friendly intermediate representation (IR). In particular, CBMC's frontend takes the linker script as a command-line argument, just like gcc, and delegates the final link to the system's native linker. CBMC thus has access to the linker script and the final binary, which contains both native executable code and CBMC IR. We send linker script information to CBMC as follows:


Our extensions are Steps 2 and 3, which we describe in more detail below. They are applicable to tools (like SeaHorn and KLEE) that use an IR linker (like llvm-link) before analyzing the IR.

**Extracting Linker Script Symbols.** Our extension to CBMC reads a linker script and extracts the information that we need. For each code section, it extracts the symbols whose addresses mark the start and end of the section, if any; and the symbol whose address indicates the section size, if any. The sections key of Fig. 2 shows the information extracted from the linker script in Fig. 1.

**Extracting Linker Script Symbol Addresses.** To remain architecture independent, our extension uses the objdump program (part of the GNU Binutils [31]) to extract the addresses of all symbols in an object file (shown in the addresses key of Fig. 2). In this way, it obtains the concrete addresses of symbols defined in the linker script.

```
"sections" : {
  ".text": {
    "start": "text_start",
    "size": "text_size"
  } ,
  ".scratch" : {
    "start": "scratch_start",
    "end": "scratch_end"
                                     }
                                   } ,
                                   "addresses" : {
                                     "text_start": "0x0200",
                                     "text_size": "0x0600",
                                     "scratch_start": "0x1000",
                                     "scratch_end": "0x2000",
                                   }
```
**Fig. 2.** Output from our linker script parser when run on the linker script in Fig. 1, on a binary with a 1 KiB .text section and 4 KiB .scratch section.

**Augmenting the Intermediate Representation.** CBMC maintains a symbol table of all the variables used in the program. Variables that are declared extern in C code and never defined have no initial value in the symbol table. CBMC can still analyze code that contains undefined symbols, but as noted earlier in this section, this can lead to incorrect verification results. Our extension to CBMC extracts information described in the previous section and integrates it into the target program's IR. For example, given the source code in Fig. 1, CBMC will replace it with the code given in Fig. 3.

In more detail, CBMC


The first two steps are necessary because C will not let us set the address of a variable, but will let us store the address in a variable. CBMC thus changes the IR type of text\_start to char \*; sets the value of text\_start to the address of text\_start in the binary; and rewrites all occurrences of "&text\_start" to "text\_start". This preserves the original semantics while allowing CBMC to model the program. The semantics of Step 4 is impossible to express in C, justifying the use of CBMC rather than a simple source-to-source transformation.

```
#include <string.h>
extern char text_start;
extern char text_size;
extern char scratch_start;
int main() {
  memcpy(&text_start ,
         &scratch_start ,
         (size_t)&text_size);
}
                                  #include <string.h>
                                  char *text_start = 0x0200;
                                  char *text_size = 0x0600;
                                  char *scratch_start = 0x1000;
                                  int main() {
                                    __CPROVER_allocated_memory(
                                      0x0200 , 0x0600);
                                    __CPROVER_allocated_memory(
                                      0x1000 , 0x1000);
                                    memcpy(text_start ,
                                            scratch_start ,
                                            (size_t)text_size);
                                  }
```
**Fig. 3.** Transformation performed by CBMC for linker-script-defined symbols.

# **5 Industrial Boot Code Verification**

In this section, we describe our experience proving memory safety of boot code running in an AWS data center. We give an exact statement of what we proved, we point out examples of the verification challenges mentioned in Sect. 4 and our solutions, and we go over the test harness and the results of running CBMC.

**Fig. 4.** Boot code is free of memory safety errors.

We use CBMC to prove that 783 lines of AWS boot code are memory safe. Soundness of this proof by bounded model checking is achieved by having CBMC check its loop unwinding assertions (that loops have been sufficiently unwound). This boot code proceeds in two stages, as illustrated in Fig. 4. The first stage prepares the machine, loads the second stage from a boot source, and launches the second stage. The behavior of the first stage is controlled by configuration information in hardware straps and one-time-programmable memory (OTP), and by device configuration. We show that no configuration will induce a memory safety error in the stage 1 boot code.

More precisely, we prove:

Assuming


then


the stage 1 boot code will not exhibit any memory safety errors.

Due to the second and third assumptions, we may be missing memory safety errors in these simple procedures. Memory safety of these procedures can be established in isolation. We find all memory safety errors in the remainder of the code, however, because making buffers smaller increases the chances they will overflow, and allowing methods to return unconstrained values increases the set of program behaviors considered.

The code we present in this section is representative of the code we analyzed, but the actual code is proprietary and not public. The open-source project rBoot [11] is 700 lines of boot code available to the public that exhibits most of the challenges we now discuss.

### **5.1 Memory-Mapped I/O**

MMIO regions are not explicitly allocated in the code, but the addresses of these regions appear in the header files. For example, an MMIO region for the hardware straps is given with


Each of the last two macros denotes the start of a different MMIO region, leaving 0x14 bytes for the region named REG\_BOOT\_STRAP. Using the builtin function added to CBMC (Sect. 4.1), we declare this region in the test harness with

```
__CPROVER_allocated_memory( REG_BOOT_STRAP , 0x14);
```
# **5.2 Device Behavior**

All of the devices accessed by the boot code are accessed via an API. For example, the API for the UART is given by

```
int UartInit(UART_PORT port , unsigned int baudRate);
void UartWriteByte(UART_PORT port , uint8_t byte);
uint8_t UartReadByte(UART_PORT port);
```
In this work, we havoc all of the devices to make our result as strong as possible. In other words, our device model allows a device read to return any value of the appropriate type, and still we can prove that (even in the context of a misbehaving device) the boot code does not exhibit a memory safety error. Because all devices have an API, we can havoc the devices using the command line option added to CBMC (Sect. 4.2), and invoke CBMC with

```
```
### **5.3 Byte-Level Memory Access**

All devices are accessed at the byte level by computing an integer-valued address and coercing it to a pointer. For example, the following code snippets from BootOptionsParse show how reading the hardware straps from the MMIO region discussed above translates into a byte-level memory access.

```
#define REG_READ( addr) (*( volatile uint32_t*)( addr))
```
regVal = REG\_READ (REG\_BOOT\_STRAP);

In CBMC, this translates into an access into an array modeling memory at location 0x1000 + 0x110. Our optimized encoding of the theory of arrays (Sect. 4.3) enables CBMC to reason more efficiently about this kind of construct.

### **5.4 Memory Copying**

The memset and memcpy procedures are heavily used in boot code. For example, the function used to copy the stage 2 boot code from flash memory amounts to a single, large memcpy.

```
int SNOR_Read(unsigned int address ,
               uint8_t* buff ,
               unsigned int numBytes) {
  ...
  memcpy(buff ,
         (void*)( address + REG_SNOR_BASE_ADDRESS),
          numBytes);
  ...
}
```
CBMC reasons more efficiently about this kind of code due to our loop-free model of memset and memcpy procedures as array operations (Sect. 4.4).

### **5.5 Linker Scripts**

Linker scripts allocate regions of memory and pass the addresses of these regions and other constants to the code through the symbol table. For example, the linker script defines a region to hold the stage 2 binary and passes the address and size of the region as the addresses of the symbols stage2\_start and stage2\_size.

```
.stage2 (NOLOAD) : {
  stage2_start = .;
  . = . + STAGE2_SIZE;
  stage2_end = .;
} > RAM2
stage2_size = SIZEOF(.stage2);
```
The code declares the symbols as externally defined, and uses a pair of macros to convert the addresses of the symbols to an address and a constant before use.

```
extern char stage2_start[];
extern char stage2_size[];
#define STAGE2_ADDRESS (( uint8_t *)(& stage2_start))
#define STAGE2_SIZE ((unsigned)(& stage2_size))
```
CBMC's new approach to handling linker scripts modifies the CBMC intermediate representation of this code as described in Sect. 4.5.

### **5.6 Test Harness**

The main procedure for the boot code begins by clearing the BSS section, copying a small amount of data from a ROM, printing some debugging information, and invoking three functions

```
SecuritySettingsOtp ();
BootOptionsParse ();
Stage2LoadAndExecute();
```
that read security settings from some one-time programmable memory, read the boot options from some hardware straps, and load and launch the stage 2 code.

The test harness for the boot code is 76 lines of code that looks similar to

```
void environment_model() {
  __CPROVER_allocated_memory( REG_BOOT_STRAP , 0x14);
  __CPROVER_allocated_memory( REG_UART_UART_BASE ,
                               UART_REG_OFFSET_LSR +
                               sizeof(uint32_t));
  __CPROVER_allocated_memory( REG_NAND_CONFIG_REG ,
                               sizeof(uint32_t));
}
void harness() {
  environment_model();
```

```
482 B. Cook et al.
```

```
SecuritySettingsOtp ();
  BootOptionsParse ();
  Stage2LoadAndExecute();
}
```
The environment\_model procedure defines the environment of the software under test not declared in the boot code itself. This environment includes more than 30 MMIO regions for hardware like some hardware straps, a UART, and some NAND memory. The fragment of the environment model reproduced above uses the \_\_CPROVER\_allocated\_memory built-in function added to CBMC for this work to declare these MMIO regions and assign them unconstrained values (modeling unconstrained configuration information). The harness procedure is the test harness itself. It builds the environment model and calls the three procedures invoked by the boot code.

# **5.7 Running CBMC**

Building the boot code and test harness for CBMC takes 8.2 s compared to building the boot code with gcc in 2.2 s.

Running CBMC on the test harness above as a job under AWS Batch, it finished successfully in 10:02 min. It ran on a 16-core server with 122 GiB of memory running Ubuntu 14.04, and consumed one core at 100% using 5 GiB of memory. The new encoding of arrays improved this time by 45 s.

The boot code consists of 783 lines of statically reachable code, meaning the number of lines of code in the functions that are reachable from the test harness in the function call graph. CBMC achieves complete code coverage, in the sense that every line of code CBMC fails to exercise is dead code. An example of dead code found in the boot code is the default case of a switch statement whose cases enumerate all possible values of an expression.

The boot code consists of 98 loops that fall into two classes. First are forloops with constant-valued expressions for the upper and lower bounds. Second are loops of the form while (num) {...; num--} and code inspection yields a bound on num. Thus, it is possible to choose loop bounds that cause all loopunwinding assertions to hold, making CBMC's results sound for boot code.

# **6 Conclusion**

This paper describes industrial experience with model checking production code. We extended CBMC to address issues that arise in boot code, and we proved that initial boot code running in data centers at Amazon Web Services is memory safe, a significant application of model checking in the industry. Our most significant extension to CBMC was parsing linker scripts to extract the memory layout described there for use in model checking, making CBMC the first static analysis tool to do so. With this and our other extensions to CBMC supporting devices and byte-level access, CBMC can now be used in a continuous validation flow to check for memory safety during code development. All of these extensions are in the public domain and freely available for immediate use.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Android Stack Machine**

Taolue Chen1,6, Jinlong He2,5, Fu Song<sup>3</sup>, Guozhen Wang<sup>4</sup>, Zhilin Wu2(B) , and Jun Yan2,5

 Birkbeck, University of London, London, UK State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China wuzl@ios.ac.cn ShanghaiTech University, Shanghai, China Beijing University of Technology, Beijing, China

<sup>5</sup> University of Chinese Academy of Sciences, Beijing, China <sup>6</sup> State Key Laboratory of Novel Software Technology,

Nanjing University, Nanjing, China

**Abstract.** In this paper, we propose Android Stack Machine (ASM), a formal model to capture key mechanisms of Android multi-tasking such as activities, back stacks, launch modes, as well as task affinities. The model is based on pushdown systems with multiple stacks, and focuses on the evolution of the back stack of the Android system when interacting with activities carrying specific launch modes and task affinities. For formal analysis, we study the reachability problem of ASM. While the general problem is shown to be undecidable, we identify expressive fragments for which various verification techniques for pushdown systems or their extensions are harnessed to show decidability of the problem.

# **1 Introduction**

Multi-tasking plays a central role in the Android platform. Its unique design, via activities and back stacks, greatly facilitates organizing user sessions through tasks, and provides rich features such as handy application switching, background app state maintenance, smooth task history navigation (using the "back" button), etc [16]. We refer the readers to Sect. 2 for an overview.

Android task management mechanism has substantially enhanced user experiences of the Android system and promoted personalized features in app design. However, the mechanism is also notoriously difficult to understand. As a witness, it constantly baffles app developers and has become a common topic of questionand-answer websites (for instance, [2]). Surprisingly, the Android multi-tasking

This work was partially supported by UK EPSRC grant (EP/P00430X/1), ARC grants (DP160101652, DP180100691), NSFC grants (61532019, 61761136011, 61662035, 61672505, 61472474, 61572478) and the National Key Basic Research (973) Program of China (2014CB340701), the INRIA-CAS joint research project "Verification, Interaction, and Proofs", and Key Research Program of Frontier Sciences, CAS, Grant No. QYZDJ-SSW-JSC036.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 487–504, 2018. https://doi.org/10.1007/978-3-319-96142-2\_29

mechanism, despite its importance, has not been thoroughly studied before, let along a formal treatment. This has impeded further developments of computeraided (static) analysis and verification for Android apps, which are indispensable for vulnerability analysis (for example, detection of task hijacking [16]) and app performance enhancement (for example, estimation of energy consumption [8]).

This paper provides a formal model, i.e., *Android Stack Machine* (ASM), aiming to capture the key features of Android multi-tasking. ASM addresses the behavior of Android *back stacks*, a key component of the multi-tasking machinery, and their interplay with attributes of the activity. In this paper, for these attributes we consider four basic *launch modes*, i.e., standard ( STD), singleTop (STP), singleTask (STK), singleInstance (SIT), and *task affinities*. (For simplicity more complicated activity attributes such as *allowTaskReparenting* will not be addressed in the present paper.) We believe that the semantics of ASM, specified as a transition system, captures faithfully the actual mechanism of Android systems. For each case of the semantics, we have created "diagnosis" apps with corresponding launch modes and task affinities, and carried out extensive experiments using these apps, ascertaining its conformance to the Android platform. (Details will be provided in Sect. 3.)

For Android, technically ASM can be viewed as the counterpart of pushdown systems with multiple stacks, which are the *de facto* model for (multithreaded) concurrent programs. Being rigours, this model opens a door towards a formal account of Android's multi-tasking mechanism, which would greatly facilitate developers' understanding, freeing them from lengthy, ambiguous, elusive Android documentations. We remark that it is known that the evolution of Android back stacks could also be affected by the *intent flags* of the activities. ASM does not address intent flags explicitly. However, the effects of most intent flags (e.g., FLAG ACTIVITY NEW TASK, FLAG ACTIVITY CLEAR TOP) can be simulated by launch modes, so this is *not* a real limitation of ASM.

Based on ASM, we also make the first step towards a formal analysis of Android multi-tasking apps by investigating the *reachability problem* which is fundamental to all such analysis. ASM is akin to pushdown systems with multiple stacks, so it is perhaps not surprising that the problem is undecidable in general; in fact, we show undecidability for most interesting fragments even with just two launch modes. In the interest of seeking more expressive, practice-relevant decidable fragments, we identify a fragment STK-**dominating ASM** which assumes STK activities have different task affinities and which further restricts the use of SIT activities. This fragment covers a majority of open-source Android apps (e.g., from Github) we have found so far. One of our technical contributions is to give a decision procedure for the reachability problem of STK-dominating ASM, which combines a range of techniques from simulations by pushdown systems with transductions [19] to abstraction methods for multi-stacks. The work, apart from independent interests in the study of multi-stack pushdown systems, lays a solid foundation for further (static) analysis and verification of Android apps related to multi-tasking, enabling model checking of Android apps, security analysis (such as discovering task hijacking), or typical tasks in software engineering such as automatic debugging, model-based testing, etc.

We summarize the main contributions as follows: (1) We propose—to the best of our knowledge—the first comprehensive formal model, Android stack machine, for Android back stacks, which is also validated by extensive experiments. (2) We study the reachability problem for Android stack machine. Apart from strongest possible undecidablity results in the general case, we provide a decision procedure for a practically relevant fragment.

# **2 Android Stack Machine: An Informal Overview**

In Android, an application, usually referred to as an *app*, is regarded as a collection of *activities*. An activity is a type of app components, an instance of which provides a graphical user interface on screen and serves the entry point for interacting with the user [1]. An app typically has many activities for different user interactions (e.g., dialling phone numbers, reading contact lists, etc). A distinguished activity is the *main* activity, which is started when the app is launched. A *task* is a collection of activities that users interact with when performing a certain job. The activities in a task are arranged in a stack in the order in which each activity is opened. For example, an email app might have one activity to show a list of latest messages. When the user selects a message, a new activity opens to view that message. This new activity is pushed to the stack. If the user presses the "Back" button, an activity is finished and is popped off the stack. [In practice, the onBackPressed() method can be overloaded and triggered when the "Back" button is clicked. Here we assume—as a model abstraction—that the onBackPressed() method is not overloaded.] Furthermore, multiple tasks may run concurrently in the Android platform and the *back stack* stores all the tasks as a stack as well. In other words, it has a nested structure being a stack of stacks (tasks). We remark that in android, activities from different apps can stay in the same task, and activities from the same app can enter different tasks.

Typically, the evolution of the back stack is dependent mainly on two attributes of activities: *launch modes* and *task affinities*. All the activities of an app, as well as their attributes, including the launch modes and task affinities, are defined in the *manifest file* of the app. The launch mode of an activity decides the corresponding operation of the back stack when the activity is launched. As mentioned in Sect. 1, there are four basic launch modes in Android: "standard", "singleTop", "singleTask" and "singleInstance". The task affinity of an activity indicates to which task the activity prefers to belong. By default, all the activities from the same app have the same affinity (i.e., all activities in the same app prefer to be in the same task). However, one can modify the default affinity of the activity. Activities defined in different apps can share a task affinity, or activities defined in the same app can be assigned with different task affinities. Below we will use a simple app to demonstrate the evolution of the back stack.

*Example 1.* In Fig. 1, an app ActivitiesLaunchDemo<sup>1</sup> is illustrated. The app contains four activities of the launch modes STD, STP, STK and SIT, depicted

<sup>1</sup> Adapted from an open-source app https://github.com/wauoen/LaunchModeDemo.

by green, blue, yellow and red, respectively. We will use the colours to name the activities. The green, blue and red activities have the same task affinity, while the yellow activity has a distinct one. The *main activity* of the app is the green activity. Each activity contains four buttons, i.e., the green, blue, yellow and red button. When a button is clicked, an instance of the activity with the colour starts. Moreover, the identifiers of all the tasks of the back stack, as well as their contents, are shown in the white zones of the window. We use the following execution trace to demonstrate how the back stack evolves according to the launch modes and the task affinities of the activities: The user clicks the buttons in the order of green, blue, blue, yellow, red, and green.


# **3 Android Stack Machine**

For <sup>k</sup> <sup>∈</sup> <sup>N</sup>, let [k] = {1, ··· , k}. For a function <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>Y</sup> , let dom(f) and rng(f) denote the domain (X) and range (Y ) of f respectively.

**Fig. 1.** ActivitiesLaunchDemo: the running example (Color figure online)

**Definition 1 (Android stack machine).** *An* Android stack machine (ASM) *is a tuple* <sup>A</sup> = (Q, Sig, q0, Δ)*, where*

	- Act *is a finite set of activities,*
	- Lmd : Act → {STD, STP, STK, SIT} *is the launch-mode function,*
	- Aft : Act <sup>→</sup> [m] *is the task-affinity function, where* <sup>m</sup> <sup>=</sup> <sup>|</sup>Act|*,*
	- <sup>A</sup><sup>0</sup> <sup>∈</sup> Act *is the* main *activity,*

For convenience, we usually write a transition (q, A, α, q ) ∈ Δ as q A,α −−→ q , and (q, , α, q ) ∈ Δ as q ,α −−→ q . Intuitively, denotes an empty back stack, - denotes there is no change over the back stack, back denotes the pop action, and start(A) denotes the activity <sup>A</sup> being started. We assume that, if the back stack is empty, the Android stack system terminates (i.e., no further continuation is possible) unless it is in the initial state <sup>q</sup>0, We use Act to denote {<sup>B</sup> <sup>∈</sup> Act <sup>|</sup> Lmd(B) = } for ∈ {STD, STP, STK, SIT}.

*Semantics.* Let <sup>A</sup> = (Q, Sig, q0, Δ) be an ASM with Sig = (Act, Lmd,Aft, A0).

<sup>A</sup> *task* of <sup>A</sup> is encoded as a word <sup>S</sup> = [A1, ··· , An] <sup>∈</sup> Act<sup>+</sup> which denotes the content of the stack, with <sup>A</sup><sup>1</sup> (resp. <sup>A</sup>n) as the top (resp. bottom) symbol, denoted by top(S) (resp. btm(S)). **We also call the bottom activity of a non-empty task** *S* **as the** *root* **activity of the task.** (Intuitively, this is the *first* activity of the task.) For ∈ {STD, STP, STK, SIT}, a task <sup>S</sup> is called a -*task* if Lmd(btm(S)) = . We define the *affinity of a task* <sup>S</sup>, denoted by Aft(S), to be Aft(btm(S)). For <sup>S</sup><sup>1</sup> <sup>∈</sup> Act<sup>∗</sup> and <sup>S</sup><sup>2</sup> <sup>∈</sup> Act<sup>∗</sup>, we use <sup>S</sup><sup>1</sup> · <sup>S</sup><sup>2</sup> to denote the concatenation of <sup>S</sup><sup>1</sup> and <sup>S</sup>2, and is used to denote the empty word in Act<sup>∗</sup>.

As mentioned in Sect. 2, the (running) tasks on Android are organized as the *back stack*, which is the main modelling object of ASM. Typically we write a back stack <sup>ρ</sup> as *a sequence of non-empty tasks*, i.e., <sup>ρ</sup> = (S1, ··· , Sn), where <sup>S</sup><sup>1</sup> and <sup>S</sup>n are called the top and the bottom task respectively. (Intuitively, <sup>S</sup><sup>1</sup> is the currently active task.) ε is used to denote the empty back stack. For a non-empty back stack <sup>ρ</sup> = (S1, ··· , Sn), we overload top by using top(ρ) to refer to the task <sup>S</sup>1, and thus top<sup>2</sup>(ρ) the top activity of <sup>S</sup>1.

**Definition 2 (Configurations).** A configuration *of* A *is a pair* (q, ρ) *where* q ∈ <sup>Q</sup> *and* <sup>ρ</sup> *is a back stack. Assume that* <sup>ρ</sup> = (S1, ··· , Sn) *with* <sup>S</sup><sup>i</sup> = [Ai,<sup>1</sup>, ··· , Ai,m<sup>i</sup> ] *for each* i ∈ [n]*. We require* ρ *to satisfy the following constraints:*


By Definition 2(5), each back stack <sup>ρ</sup> contains at most <sup>|</sup>ActSIT<sup>|</sup> <sup>+</sup> <sup>|</sup>rng(Aft)<sup>|</sup> (more precisely, <sup>|</sup>ActSIT<sup>|</sup> <sup>+</sup> |{Aft(A) <sup>|</sup> <sup>A</sup> <sup>∈</sup> Act \ ActSIT}|) tasks. Moreover, by Definition 2(1–5), all the root activities in a configuration are pairwise distinct, which allows to refer to a task whose root activity is A as *the* A-task.

Let Conf<sup>A</sup> denote the set of configurations of <sup>A</sup>. The *initial* configuration of A is (q0, ε). To formalize the semantics of A concisely, we introduce the following shorthand stack operations and one auxiliary function. Here <sup>ρ</sup> = (S1, ··· , Sn) is a non-empty back stack.

Noaction(ρ) <sup>≡</sup> <sup>ρ</sup> Push(ρ, B) <sup>≡</sup> (([B] · <sup>S</sup>1), S2, ··· , Sn) NewTask(B) <sup>≡</sup> ([B]) NewTask(ρ, B) <sup>≡</sup> ([B], S1, ··· , Sn) Pop(ρ) <sup>≡</sup> ⎧ ⎨ ⎩ ε, if n = 1 and S<sup>1</sup> = [A]; (S2, ··· , Sn), if n > 1 and <sup>S</sup><sup>1</sup> = [A]; (S <sup>1</sup>, S2, ··· , Sn), if <sup>S</sup><sup>1</sup> = [A] · <sup>S</sup> <sup>1</sup> with S <sup>1</sup> <sup>∈</sup> Act<sup>+</sup>; PopUntil(ρ, B) <sup>≡</sup> (S <sup>1</sup> , S2, ··· , Sn), where S<sup>1</sup> = S <sup>1</sup> · S <sup>1</sup> with S <sup>1</sup> <sup>∈</sup> (Act \ {B})<sup>∗</sup> and top(S <sup>1</sup> ) = B; Move2Top(ρ, i) <sup>≡</sup> (Si, S1, ··· , Si−<sup>1</sup>, Si+1, ··· , Sn) GetNonSITTaskByAft(ρ, k) <sup>≡</sup> Si, if Aft(Si) = <sup>k</sup> and Lmd(btm(Si)) <sup>=</sup> SIT; Undef, otherwise.

Intuitively, GetNonSITTaskByAft(ρ, k) returns a non-SIT task whose affinity is <sup>k</sup> if it exists, otherwise returns Undef.

In the sequel, we define the transition relation (q, ρ) <sup>A</sup> −→ (q , ρ ) on Conf<sup>A</sup> to formalize the semantics of A. We start with the transitions out of the initial state q<sup>0</sup> and those with or back action.


The most interesting case is, however, the transitions of the form q A,start(B) −−−−−−→ q . We shall make case distinctions based on the launch mode of B. For each transition q A,start(B) −−−−−−→ <sup>q</sup> and (q, ρ) <sup>∈</sup> Conf<sup>A</sup> such that top<sup>2</sup>(ρ) = <sup>A</sup>, (q, ρ) <sup>A</sup> −→ (q , ρ ) if one of the following cases holds. Assume <sup>ρ</sup> = (S1, ··· , Sn). Case Lmd(B) = STD


# Case Lmd(B) = STP

	- \* if top(Si) <sup>=</sup> <sup>B</sup>, <sup>ρ</sup> <sup>=</sup> Push(Move2Top(ρ, i), B), \* if top(Si) = <sup>B</sup>, <sup>ρ</sup> <sup>=</sup> Move2Top(ρ, i);

# Case Lmd(B) = SIT


# Case Lmd(B) = STK


• if GetNonSITTaskByAft(ρ,Aft(B)) = <sup>S</sup>i 7,

\* if <sup>B</sup> does *not* occur in <sup>S</sup>i (see footnote 5), then <sup>ρ</sup> <sup>=</sup> Push(Move2Top(ρ, i), B); \* if <sup>B</sup> occurs in <sup>S</sup>i <sup>8</sup>, then <sup>ρ</sup> <sup>=</sup> PopUntil(Move2Top(ρ, i), B),

• if GetNonSITTaskByAft(ρ,Aft(B)) = Undef, then <sup>ρ</sup> <sup>=</sup> NewTask(ρ, B);

This concludes the definition of the transition definition of <sup>A</sup> −→. As usual, we use A <sup>⇒</sup> to denote the reflexive and transitive closure of <sup>A</sup> −→.

*Example 2.* The ASM for the ActivitiesLaunchDemo app in Example 1 is A = (Q, Sig, q0, Δ), where <sup>Q</sup> <sup>=</sup> {q0, q1}, Sig = (Act, Lmd,Aft, Ag) with


and Δ comprises the transitions illustrated in Fig. 2. Below is a path in the graph A −→ corresponding to the sequence of user actions clicking the green, blue, blue, yellow, red, blue button (cf. Example 1),

$$\begin{array}{ll} (q\_0, \varepsilon) \xrightarrow{\mathfrak{b}, \mathsf{start}(A\_g)} (q\_1, ([A\_g])) \xrightarrow{A\_g, \mathsf{start}(A\_b)} (q\_1, ([A\_b, A\_g])) \xrightarrow{A\_b, \mathsf{start}(A\_b)} \\ (q\_1, ([A\_b, A\_g])) \xrightarrow{A\_b, \mathsf{start}(A\_g)} (q\_1, ([A\_g], [A\_b, A\_g])) \xrightarrow{A\_b, \mathsf{start}(A\_r)} \\ (q\_1, ([A\_r], [A\_g], [A\_b, A\_g])) \xrightarrow{A\_r, \mathsf{start}(A\_g)} (q\_1, ([A\_g, A\_b, A\_g], [A\_r], [A\_g])). \end{array}$$

Proposition 1 reassures that <sup>A</sup> −→ is indeed a relation on Conf<sup>A</sup> as per Definition 2.

**Proposition 1.** *Let* A *be an ASM. For each* (q, ρ) <sup>∈</sup> Conf<sup>A</sup> *and* (q, ρ) <sup>A</sup> −→ (q , ρ )*,* (q , ρ ) ∈ Conf<sup>A</sup>*, namely,* (q , ρ ) *satisfies the five constraints in Definition 2.*

**Fig. 2.** ASM corresponding to the ActivitiesLaunchDemo app

*Remark 1.* A single app can clearly be modeled by an ASM. However, ASM can also be used

to model multiple apps which may share tasks/activities. (In this case, these multiple apps can be composed into a single app, where a new main activity is added.) This is especially useful when analysing, for instance, task hijacking [16]. We sometimes do not specify the main activity explicit for convenience. The translation from app source code to ASM is not trivial, but follows standard routines. In particular, in ASM, the symbols stored into the back stack are just

<sup>7</sup> If <sup>i</sup> exists, it must be unique by Definition 2(5). Moreover, i > 1, as Lmd(A) -= SIT <sup>=</sup><sup>⇒</sup> Aft(B) -= Aft(S1).

<sup>8</sup> Note that B occurs at most once in S*<sup>i</sup>* by Definition 2(1).

names of activities. Android apps typically need to, similar to function calls of programs, store additional local state information. This can be dealt with by introducing an extend activity alphabet such that each symbol is of the form <sup>A</sup>(*b*), where <sup>A</sup> <sup>∈</sup> Act and *<sup>b</sup>* represents local information. When we present examples, we also adopt this general syntax.

*Model validation.* We validate the ASM model by designing "diagnosis" Android apps with extensive experiments. For each case in the semantics of ASM, we design an app which contains activities with the corresponding launch modes and task affinities. To simulate the transition rules of the ASM, each activity contains some buttons, which, when clicked, will launch other activities. For instance, in the case of Lmd(B) = STD, Lmd(A) = SIT, GetNonSITTaskByAft(ρ,Aft(B)) = Undef, the app contains two activities <sup>A</sup> and <sup>B</sup> of launch modes SIT and STD respectively, where A is the main activity. When the app is launched, an instance of A is started. A contains a button, which, when clicked, starts an instance of B. We carry out the experiment by clicking the button, monitoring the content of the back stack, and checking whether the content of the back stack conforms to the definition of the semantics. Specifically, we check that there are exactly two tasks in the back stack, one task comprising a single instance of A and another task comprising a single instance of B, with the latter task on the top. Our experiments are done in a Redmi-4A mobile phone with Android version 6.0.1. The details of the experiments can be found at https://sites.google.com/ site/assconformancetesting/.

# **4 Reachability of ASM**

Towards formal (static) analysis and verification of Android apps, we study the fundamental *reachability* problem of ASM. Fix an ASM <sup>A</sup> = (Q, Sig, q0, Δ) with Sig = (Act, Lmd,Aft, A0) and a *target state* <sup>q</sup> <sup>∈</sup> <sup>Q</sup>. There are usually two variants: the *state reachability problem* asks whether (q0, ε) <sup>A</sup> −→ (q, ρ) for *some* back stack ρ, and the *configuration reachability problem* asks whether (q0, ε) <sup>A</sup> −→ (q, ρ) when ρ is also given. We show they are interchangeable as far as decidability is concerned.

**Proposition 2.** *The configuration reachability problem and the state reachability problem of ASM are interreducible in exponential time.*

Proposition 2 allows to focus on the state reachability problem in the rest of this paper. Observe that, when the activities in an ASM are of the same launch mode, the problem degenerates to that of standard pushdown systems or even finite-state systems. These systems are well-understood, and we refer to [6] for explanations. To proceed, we deal with the cases where there are exactly two launch modes, for which we have <sup>4</sup> 2 = 6 possibilities. The classification is given in Theorems 1 and 2. Clearly, they entail that the reachability for general ASM (with at least two launch modes) is undecidable. To show the undecidablity, we reduce from Minsky's two-counter machines [14], which, albeit standard, reveals the expressibility of ASM. We remark that the capability of *swapping the order* of two distinct non-SIT-tasks in the back stack—*without resetting* the content of any of them—is the main source of undecidability.

**Theorem 1.** *The reachability problem of ASM is undecidable, even when the ASM contains only (1)* STD *and* STK *activities, or (2)* STD *and* SIT *activities, or (3)* STK *and* STP *activities, or (4)* SIT *and* STP *activities.*

In contrast, we have some relatively straightforward positive results:

**Theorem 2.** *The state reachability problem of ASM is decidable in polynomial time when the ASM contains* STD *and* STP *activities only, and in polynomial space when the ASM contains* STK *and* SIT *activities only.*

As mentioned in Sect. 1, we aim to identify expressive fragments of ASM with decidable reachability problems. To this end, we introduce a fragment called STK**-dominating ASM**, which accommodates all four launch modes.

**Definition 3 (**STK**-dominating ASM).** *An ASM is said to be* STKdominating *if the following two constraints are satisfied:*


The following result explains the name "STK-dominating".

**Proposition 3.** *Let* <sup>A</sup> = (Q, Sig, q0, Δ) *be an* STK*-dominating ASM with* Sig = (Act, Lmd,Aft, A0)*. Then each configuration* (q, ρ) *that is reachable from the initial configuration* (q0, ε) *in* A *satisfies the following constraints: (1) for each* STK *activity* <sup>A</sup> <sup>∈</sup> Act *with* Aft(A) <sup>=</sup> Aft(A0)*,* <sup>A</sup> *can only occur at the bottom of some task in* <sup>ρ</sup>*, (2)* <sup>ρ</sup> *contains at most one* STD/STP*-task, which, when it exists, has the same affinity as* A0*.*

It is not difficult to verify that the ASM given in Example <sup>2</sup> is STK-dominating.

**Theorem 3.** *The state reachability of* STK*-dominating ASM is in* 2-EXPTIME*.*

The proof of Theorem 3 is technically the most challenging part of this paper. We shall give a sketch in Sect. 5 with the full details in [6].

# **5 STK-dominating ASM**

For simplicity, we assume that <sup>A</sup> **contains** STD **and** STK **activities only**<sup>9</sup>. To tackle the (state) reachability problem for STK-dominating ASM, we consider two cases, i.e., Lmd(A0) = STK and Lmd(A0) <sup>=</sup> STK. The former case is simpler

<sup>9</sup> The more general case that <sup>A</sup> also contains STP and SIT activities is slightly more involved and requires more space to present, which can be found in [6].

because, by Proposition 3, all tasks will be rooted at STK activities. For the latter, more general case, the back stack may contain, apart from several tasks rooted at STK activities, one single task rooted at <sup>A</sup>0. Sections 5.1 and 5.2 will handle these two cases respectively.

We will, however, first introduce some standard, but necessary, backgrounds on pushdown systems. We assume familiarity with standard *finite-state automata* (NFA) and *finite-state transducers* (FST). We emphasize that, in this paper, FST refers to a special class of finite-state transducers, namely, *letter-to-letter* finite-state transducers where the input and output alphabets are the same.

*Preliminaries of Pushdown systems.* A *pushdown system* (PDS) is a tuple P = (Q, Γ, Δ), where Q is a finite set of *control states*, Γ is a finite *stack alphabet*, and Δ ⊆ Q×Γ ×Γ<sup>∗</sup> ×Q is a finite set of transition rules. The size of P, denoted by |P|, is defined as |Δ|.

Let P = (Q, Γ, Δ) be a PDS. A *configuration* of P is a pair (q, w) ∈ Q × Γ∗, where w denotes the *content* of the stack (with the leftmost symbol being the top of the stack). Let Conf<sup>P</sup> denote the set of configurations of <sup>P</sup>. We define a binary relation <sup>P</sup> −→ over Conf<sup>P</sup> as follows: (q, w) <sup>P</sup> −→ (q , w ) iff w = γw<sup>1</sup> and there exists w ∈ Γ<sup>∗</sup> such that (q, γ, w, q ) <sup>∈</sup> <sup>Δ</sup> and <sup>w</sup> <sup>=</sup> <sup>w</sup>w1. We use <sup>P</sup> ⇒ to denote the *reflexive and transitive closure* of <sup>P</sup> −→.

A configuration (q , w ) is *reachable* from (q, w) if (q, w) <sup>P</sup> ⇒(q , w ). For C ⊆ Conf<sup>P</sup> , pre<sup>∗</sup>(C) (resp. post<sup>∗</sup>(C)) denotes the set of *predecessor* (resp. *successor* ) reachable configurations {(q , w ) | ∃(q, w) ∈ C,(q , w ) <sup>P</sup> ⇒(q, w)} (resp. {(q , w ) | <sup>∃</sup>(q, w) <sup>∈</sup> C,(q, w) <sup>P</sup> ⇒(q , w )}). For <sup>q</sup> <sup>∈</sup> <sup>Q</sup>, we define <sup>C</sup>q <sup>=</sup> {q} × <sup>Γ</sup><sup>∗</sup> and write pre<sup>∗</sup>(q) and post<sup>∗</sup>(q) as shorthand of pre<sup>∗</sup>(Cq) and post<sup>∗</sup>(Cq) respectively.

As a standard machinery to solve reachability for PDS, a P-*multi-automaton* (P-MA) is an NFA A = (Q , Γ, δ, I, F) such that I ⊆ Q ⊆ Q [4]. Evidently, multi-automata are a special class of NFA. Let A = (Q , Γ, δ, I, F) be a P-MA and (q, w) <sup>∈</sup> Conf<sup>P</sup> , (q, w) is *accepted* by <sup>A</sup> if <sup>q</sup> <sup>∈</sup> <sup>I</sup> and there is an accepting run <sup>q</sup>0q<sup>1</sup> ··· <sup>q</sup>n of <sup>A</sup> on <sup>w</sup> with <sup>q</sup><sup>0</sup> <sup>=</sup> <sup>q</sup>. Let Conf<sup>A</sup> denote the set of configurations accepted by A. Moreover, let L(A) denote the set of words w such that (q, w) ∈ Conf<sup>A</sup> for some <sup>q</sup> <sup>∈</sup> <sup>I</sup>. For brevity, we usually write MA instead of <sup>P</sup>-MA when P is clear from the context. Moreover, for an MA A = (Q , Γ, δ, I, F) and q ∈ Q, we use A(q ) to denote the MA obtained from A by replacing I with {q }. A set of configurations <sup>C</sup> <sup>⊆</sup> Conf<sup>P</sup> is *regular* if there is an MA <sup>A</sup> such that Conf<sup>A</sup> <sup>=</sup> <sup>C</sup>.

**Theorem 4 (**[4]**).** *Given a PDS* P *and a set of configurations accepted by an MA* A*, we can compute, in polynomial time in* |P| + |A|*, two MAs* Apre<sup>∗</sup> *and* <sup>A</sup>post<sup>∗</sup> *that recognise* pre<sup>∗</sup>(Conf<sup>A</sup>) *and* post<sup>∗</sup>(Conf<sup>A</sup>) *respectively.*

The connection between ASM and PDS is rather obvious. In a nutshell, ASM can be considered as a PDS with *multiple* stacks, which is well-known to be undecidable in general. Our overall strategy to attack the state reachability problem for the fragments of ASM is to simulate them (in particular, the multiple stacks) via—in some cases, decidable extensions of—PDS.

### **5.1 Case Lmd(***A***0) = STK**

Our approach to tackle this case is to simulate A by an *extension* of PDS, i.e., *pushdown systems with transductions* (TrPDS), proposed in [19]. In TrPDS, each transition is associated with an FST defining how the stack content is modified. Formally, a TrPDS is a tuple P = (Q, Γ, *T* , Δ), where Q and Γ are precisely the same as those of PDS, *T* is a finite set of FSTs over the alphabet Γ, and Δ ⊆ Q×Γ ×Γ<sup>∗</sup> ×*T* ×Q is a finite set of transition rules. Let R(*T* ) denote the set of transductions defined by FSTs from *T* and -R(*T* ) denote the *closure* of R(*T* ) under composition and left-quotient. A TrPDS P is said to be *finite* if -R(*T* ) is finite.

The configurations of P are defined similarly as in PDS. We define a binary relation <sup>P</sup> −→ on Conf<sup>P</sup> as follows: (q, w) <sup>P</sup> −→ (q , w ) if there are γ ∈ Γ, the words w1, u, w2, and T ∈ *T* such that w = γw1, (q, γ, u, T , q ) ∈ Δ, w<sup>1</sup> T −→ w2, and w = uw2. Let <sup>P</sup> <sup>⇒</sup> denote the reflexive and transitive closure of <sup>P</sup> −→. Similarly to PDS, we can define pre<sup>∗</sup>(·) and post<sup>∗</sup>(·) respectively. Regular sets of configurations of TrPDS can be represented by MA, in line with PDS. More precisely, given a finite TrPDS P = (Q, Γ, *T* , Δ) and an MA A for P, one can compute, in time polynomial in |P| + |-R(*T* )| + |A|, two MAs Apre<sup>∗</sup> and Apost<sup>∗</sup> that recognize the sets pre<sup>∗</sup>(Conf<sup>A</sup>) and post<sup>∗</sup>(Conf<sup>A</sup>) respectively [17–19].

To simulate <sup>A</sup> via a finite TrPDS <sup>P</sup>, the back stack <sup>ρ</sup> = (S1, ··· , Sn) of <sup>A</sup> is encoded by a word <sup>S</sup><sup>1</sup>···Sn<sup>⊥</sup> (where  is a delimiter and <sup>⊥</sup> is the bottom symbol of the stack), which is stored in the stack of P. Recall that, in this case, each task <sup>S</sup>i is rooted at an STK-activity which sits on the bottom of <sup>S</sup>i. Suppose top(S1) = <sup>A</sup>. When a transition (q, A,start(B), q ) with <sup>B</sup> <sup>∈</sup> ActSTK is fired, according to the semantics of <sup>A</sup>, the <sup>B</sup>-task of <sup>ρ</sup>, say <sup>S</sup>i, is switched to the top of ρ and changed into [B] (i.e., all the activities in the B-task, except B itself, are popped). To simulate this in P, we replace every stack symbol in the place of <sup>S</sup>i with a dummy symbol † and keep the other symbols unchanged. On the other hand, to simulate a back action of <sup>A</sup>, <sup>P</sup> continues popping until the next non-dummy and non-delimiter symbol is seen.

**Proposition 4.** *Let* <sup>A</sup> = (Q, Sig, q0, Δ) *be an* STK*-dominating ASM with* Sig = (Act, Lmd,Aft, A0) *and* Lmd(A0) = STK*. Then a finite TrPDS* <sup>P</sup> <sup>=</sup> (Q , Γ, *T* , Δ ) *with* Q ⊆ Q *can be constructed in time polynomial in* |A| *such that, for each* q ∈ Q*,* q *is reachable from* (q0, ε) *in* A *iff* q *is reachable from* (q0, ⊥) *in* P*.*

For a state <sup>q</sup> <sup>∈</sup> <sup>Q</sup>, pre<sup>∗</sup> <sup>P</sup> (q) can be effectively computed as an MA <sup>B</sup>q, and the reachability of <sup>q</sup> in <sup>A</sup> is reduced to checking whether (q0, <sup>⊥</sup>) <sup>∈</sup> Conf<sup>B</sup><sup>q</sup> .

#### **5.2 Case Lmd(***A***0)** *-***= STK**

We then turn to the more general case Lmd(A0) <sup>=</sup> STK which is significantly more involved. For exposition purpose, we consider an ASM A where **there are exactly two** STK **activities** <sup>A</sup>1, A2, and the task affinity of <sup>A</sup><sup>2</sup> is the same as that of the main task A<sup>0</sup> (and thus the task affinity of A<sup>1</sup> is different from that of A0). We also assume that all the activities in A are "standard" except A1, A2. Namely Act <sup>=</sup> ActSTD ∪ {A1, A2} and <sup>A</sup><sup>0</sup> <sup>∈</sup> ActSTD in particular. Neither of these two assumptions is fundamental and their generalization is given in [6].

By Proposition 3, there are at most two tasks in the back stack of A. The two tasks are either an A0-task and an A1-task, or an A2-task and an A1-task. An A2-task can only surface when the original A0-task is popped empty. If this happens, no A0-task will be recreated again, and thus, according to the arguments in Sect. 5.1, we can simulate the ASM by TrPDS directly and we are done. The challenging case is that we have both an A0-task and an A1-task. To solve the state reachability problem, the main technical difficulty is that the order of the A0-task and the A1-task may be switched for arbitrarily many times before reaching the target state q. Readers may be wondering why they *cannot* simply simulate two-counter machines. The reason is that the two tasks are *asymmetric* in the sense that, each time when the A1-task is switched from the bottom to the top (by starting the activity A1), the content of the A1-task is reset into [A1]. But this is *not* the case for A0-task: when the A0-task is switched from the bottom to the top (by starting the activity A2), if it does not contain A2, then A<sup>2</sup> will be pushed into the A0-task; otherwise all the activities above A<sup>2</sup> will be popped and A<sup>2</sup> becomes the top activity of the A0-task. Our decision procedure below utilises the asymmetry of the two tasks.

*Intuition of construction.* The crux of reachability analysis is to construct a *finite abstraction* for the A1-task and incorporate it into the control states of A, so we can reduce the state reachability of A into that of a pushdown system P<sup>A</sup> (with a single stack). Observe that a run of A can be seen as a sequence of task switching. In particular, an A0; A1; A<sup>0</sup> *switching* denotes a path in <sup>A</sup> −→ where the A0-task is on the top in the *first* and the *last* configuration, while the A1-task is on the top in all the *intermediate* configurations. The main idea of the reduction is to simulate the A0; A1; A<sup>0</sup> switching by a "macro"-transition of PA. Note that the A0-task regains the top task in the last configuration either by starting the activity A<sup>2</sup> or by emptying the A1-task. Suppose that, for an A0; A1; A<sup>0</sup> switching, in the first (resp. last) configuration, q (resp. q ) is the control state and α (resp. β) is the finite abstraction of the A1-task. Then for the "macro"-transition of PA, the control state will be updated from (q, α) to (q , β), and the stack content of P<sup>A</sup> is updated accordingly:


Roughly speaking, the abstraction of the A1-task must carry the information that, when A0-task and A1-task are the top resp. bottom task of the back stack and A0-task is emptied, whether the target state q can be reached from the configuration at that time. As a result, we define the abstraction of the A1-task whose content is encoded by a word <sup>w</sup> <sup>∈</sup> Act∗, denoted by <sup>α</sup>(w), as the set of all states q ∈ Q such that the target state q can be reached from (q,(w)) in A. [Note that during the process that q is reached from (q,(w)) in A, the A0-task does not exist anymore, but a (new) <sup>A</sup>2-task, may be formed.] Let AbsA<sup>1</sup> = 2Q.

To facilitate the construction of the PDS PA, we also need to record how the abstraction "evolves". For each (q , A, α) <sup>∈</sup> <sup>Q</sup>×(Act\{A1})×AbsA<sup>1</sup> , we compute the set Reach(q , A, α) consisting of pairs (q, β) satisfying: there is an A0; A1; A<sup>0</sup> switching such that in the first configuration, A is the top symbol of the A0-task, q (resp. q) is the control state of the first (resp. last) configuration, and α (resp. β) is the abstraction for the A1-task in the first (resp. last) configuration.<sup>10</sup> *Computing* Reach(q , A, α). Let (q , A, α) <sup>∈</sup> <sup>Q</sup> <sup>×</sup> (Act \ {A1}) <sup>×</sup> AbsA<sup>1</sup> . We first simulate relevant parts of A as follows:

– Following Sect. 5.1, we construct a TrPDS <sup>P</sup>A<sup>0</sup> = (QA<sup>0</sup> , ΓA<sup>0</sup> , *<sup>T</sup>*A<sup>0</sup> , ΔA<sup>0</sup> ) to simulate *the* A1-*task and* A2-*task* of A after the A0-task is emptied, where <sup>Q</sup>A<sup>0</sup> <sup>=</sup> <sup>Q</sup> <sup>∪</sup> <sup>Q</sup> <sup>×</sup> <sup>Q</sup> and <sup>Γ</sup>A<sup>0</sup> <sup>=</sup> Act ∪ {, †, ⊥}. Note that <sup>A</sup><sup>0</sup> may still—as a "standard" activity—occur in <sup>P</sup>A<sup>0</sup> though the <sup>A</sup>0-task disappears. In addition, we construct an MA <sup>B</sup><sup>q</sup> = (Qq, ΓA<sup>0</sup> , δq, Iq, Fq) to represent

pre<sup>∗</sup> PA0 (q), where <sup>I</sup><sup>q</sup> <sup>⊆</sup> <sup>Q</sup>A<sup>0</sup> . Then given a stack content <sup>w</sup> <sup>∈</sup> Act<sup>∗</sup> STDA<sup>1</sup> of the <sup>A</sup>1-task, the abstraction <sup>α</sup>(w) of <sup>w</sup>, is the set of <sup>q</sup> <sup>∈</sup> <sup>I</sup>q <sup>∩</sup> <sup>Q</sup> such that (q, w⊥) <sup>∈</sup> ConfB<sup>q</sup> .

– We construct a PDS <sup>P</sup>A0, A<sup>2</sup> = (QA0, A<sup>2</sup> , ΓA0, A<sup>2</sup> , *<sup>T</sup>*A0, A<sup>2</sup> , ΔA0, A<sup>2</sup> ) to simulate the <sup>A</sup>1-*task* of <sup>A</sup>, where <sup>Γ</sup>A0, A<sup>2</sup> = (Act \ {A2}) ∪ {⊥}. In addition, to compute Reach(q , A, α) later, we construct an MA <sup>M</sup>(q,A,α) <sup>=</sup> (Q(q,A,α), ΓA0, A<sup>2</sup> , δ(q,A,α), I(q,A,α), F(q,A,α)) to represent

$$\mathsf{post}\_{\mathcal{D}\_{\overbrace{\mathsf{FT}\_{0}\ldots\mathsf{FT}\_{\star}}}}^{\*}(\{(q\_{1},A\_{1}\bot) \mid (q',A,\mathsf{start}(A\_{1}),q\_{1}) \in \Delta\}).$$

**Definition 4.** Reach(q , A, α) *comprises*


Importantly, conditions in Definition 4 can be characterized algorithmically.

**Lemma 1.** *For* (q , A, α) <sup>∈</sup> <sup>Q</sup> <sup>×</sup> (Act \ {A1}) <sup>×</sup> AbsA<sup>1</sup> *,* Reach(q , A, α) *is the union of*

*–* {(q, <sup>⊥</sup>) <sup>|</sup> (q, <sup>⊥</sup>) <sup>∈</sup> Conf<sup>M</sup>(q,A,α) } *and*

<sup>10</sup> As we can see later, Reach(q- , A, α) does not depend on α for the two-task special case considered here. We choose to keep α in view of readability.

*– the set of pairs* (q, β) <sup>∈</sup> <sup>Q</sup> <sup>×</sup> AbsA<sup>1</sup> *such that there exist* <sup>q</sup><sup>2</sup> <sup>∈</sup> <sup>Q</sup> *and* <sup>B</sup> <sup>∈</sup> Act \ {A2} *satisfying that* (q2,B,start(A2), q)*, and* (B(Act \ {A2})<sup>∗</sup> <sup>⊥</sup>) <sup>∩</sup> (Act<sup>∗</sup> STDA<sup>1</sup> <sup>⊥</sup>) <sup>∩</sup> (L(M(q,A,α)(q2))⊥ −<sup>1</sup>)⊥∩Lβ <sup>=</sup> <sup>∅</sup>, *where* <sup>L</sup>(M(q,A,α)(q2))⊥ −<sup>1</sup> *is the set of words* <sup>w</sup> *such that* <sup>w</sup><sup>⊥</sup> *belongs to* <sup>L</sup>(M(q,A,α)(q2))*, and* <sup>L</sup>β <sup>=</sup> q∈β <sup>L</sup>(Bq(q)) <sup>∩</sup> q∈Q\β <sup>L</sup>(Bq(q))*, with* <sup>L</sup>

*representing the complement language of* L*.*

*Construction of* <sup>P</sup>A. We first construct a PDS <sup>P</sup>A<sup>0</sup> = (QA<sup>0</sup> , ΓA<sup>0</sup> , ΔA<sup>0</sup> ), to simulate the <sup>A</sup>0-task of <sup>A</sup>. Here <sup>Q</sup>A<sup>0</sup> = (<sup>Q</sup> × {0, <sup>1</sup>}) <sup>∪</sup> (<sup>Q</sup> × {1}×{pop}), <sup>Γ</sup>A<sup>0</sup> <sup>=</sup> ActSTD ∪ {A2, ⊥}, and <sup>Δ</sup>A<sup>0</sup> comprises the transitions. Here 1 (resp. 0) marks that the activity A<sup>2</sup> is in the stack (resp. is not in the stack) and the tag pop marks that the PDS is in the process of popping until <sup>A</sup>2. The construction of <sup>P</sup>A<sup>0</sup> is relatively straightforward, the details of which can be found in [6].

We then define the PDS <sup>P</sup><sup>A</sup> = (QA, ΓA<sup>0</sup> , ΔA), where <sup>Q</sup><sup>A</sup> = (AbsA<sup>1</sup> <sup>×</sup>QA<sup>0</sup> )<sup>∪</sup> {q}, and Δ<sup>A</sup> comprises the following transitions,

	- if A = A2, then we have ((α,(q , 0)), A, A2A,(β,(q, 1))) ∈ Δ<sup>A</sup> and ((α,(q , 1)), A, ε,(β,(q, <sup>1</sup>, pop))) <sup>∈</sup> <sup>Δ</sup>A,
	- if A = A2, then we have ((α,(q , 1)), A2, A2,(β,(q, 1))) ∈ ΔA, [**switch to the** A1-**task and switch back to the** A0-**task later by launching** A2]

**Proposition 5.** *Let* <sup>A</sup> *be an* STK*-dominating ASM where there are exactly two* STK*-activities* <sup>A</sup>1, A<sup>2</sup> *and* Aft(A2) = Aft(A0)*. Then* <sup>q</sup> *is reachable from the initial configuration* (q0, ε) *in* A *iff* q *is reachable from the initial configuration* ((∅,(q0, 0)), ⊥) *in* PA*.*

# **6 Related Work**

We first discuss *pushdown systems with multiple stacks* (MPDSs) which are the most relevant to ASM. (For space reasons we will skip results on general pushdown systems though.) A multitude of classes of MPDSs have been considered, mostly as a model for *concurrent* recursive programs. In general, an ASM can be encoded as an MPDS. However, this view is hardly profitable as general MPDSs are obviously Turing-complete, leaving the reachability problem undecidable.

To regain decidability at least for reachability, several subclasses of MPDSs were proposed in literature: (1) bounding the number of context-switches [15], or more generally, phases [10], scopes [11], or budgets [3]; (2) imposing a linear ordering on stacks and pop operations being reserved to the first non-empty stack [5]; (3) restricting control states (e.g., *weak* MPDSs [7]). However, our decidable subclasses of ASM admit none of the above bounded conditions. A unified and generalized criterion [12] based on MSO over graphs of bounded tree-width was proposed to show the decidability of the emptiness problem for several restricted classes of automata with auxiliary storage, including MPDSs, automata with queues, or a mix of them. Since ASMs work in a way fairly different from multi-stack models in the literature, it is unclear—literally for us—to obtain the decidability by using bounded tree-width approach. Moreover, [12] only provides decidability proofs, but without complexity upper bounds. Our decision procedure is based on symbolic approaches for pushdown systems, which provides complexity upper bounds and which is amenable to implementation.

Higher-order pushdown systems represent another type of generalization of pushdown systems through higher-order stacks, i.e., a nested "stack of stack" structure [13], with decidable reachability problems [9]. Despite apparent resemblance, the back stack of ASM can *not* be simulated by an order-2 pushdown system. The reason is that the order between tasks in a back stack may be dynamically changed, which is not supported by order-2 pushdown systems.

On a different line, there are some models which have addressed, for instance, GUI activities of Android apps. *Window transition graphs* were proposed for representing the possible GUI activity (window) sequences and their associated events and callbacks, which can capture how the events and callbacks modify the back stack [21]. However, the key mechanisms of back stacks (launch modes and task affinities) were not covered in this model. Moreover, the reachability problem for this model was not investigated. A similar model, labeled transition graph with stack and widget (LATTE [20]) considered the effects of launch modes on the back stacks, but not task affinities. LATTE is essentially a finite-state abstraction of the back stack. However, to faithfully capture the launch modes and task affinities, one needs an infinite-state system, as we have studied here.

### **7 Conclusion**

In this paper, we have introduced Android stack machine to formalize the back stack system of the Android platform. We have also investigated the decidability of the reachability problem of ASM. While the reachability problem of ASM is undecidable in general, we have identified a fragment, i.e., STK-dominating ASM, which is expressive and admits decision procedures for reachability.

The implementation of the decision procedures is in progress. We also plan to consider other features of Android back stack systems, e.g., the "allowTaskReparenting" attribute of activities. A long-term program is to develop an efficient and scalable formal analysis and verification framework for Android apps, towards which the work reported in this paper is the first cornerstone.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Formally Verified Montgomery Multiplication**

Christoph Walther(B)

Technische Universit¨at Darmstadt, Darmstadt, Germany Chr.Walther@informatik.tu-darmstadt.de

**Abstract.** We report on a machine assisted verification of an efficient implementation of Montgomery Multiplication which is a widely used method in cryptography for efficient computation of modular exponentiation. We shortly describe the method, give a brief survey of the VeriFun system used for verification, present the formal proofs and report on the effort for creating them. Our work uncovered a serious fault in a published algorithm for computing multiplicative inverses based on Newton-Raphson iteration, thus providing further evidence for the benefit of computer-aided verification.

**Keywords:** Modular arithmetic · Multiplicative inverses Montgomery Multiplication · Program verification Theorem proving by induction

# **1 Introduction**

Montgomery Multiplication [6] is a method for efficient computation of residues a*<sup>j</sup>* mod n which are widely used in cryptography, e.g. for RSA, Diffie-Hellman, ElGamal, DSA, ECC etc. [4,5]. The computation of these residues can be seen as an iterative calculation in the commutative ring with identity R*<sup>n</sup>* = (N*n*, <sup>⊕</sup>, <sup>i</sup>*n*, , <sup>0</sup>, <sup>1</sup> mod n) where <sup>n</sup> <sup>≥</sup> 1, <sup>N</sup>*<sup>n</sup>* <sup>=</sup> {0,...,n <sup>−</sup> <sup>1</sup>}, addition defined by a ⊕ b = a + b mod n, inverse operator defined by i*n*(a) = a · (n − 1) mod n, multiplication defined by a b = a · b mod n, neutral element 0 and identity 1 mod n.

For any <sup>m</sup> <sup>∈</sup> <sup>N</sup> relatively prime to <sup>n</sup>, some <sup>m</sup>-<sup>1</sup> *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>*<sup>n</sup>* exists such that <sup>m</sup> <sup>m</sup>-<sup>1</sup> *<sup>n</sup>* = 1 mod n. <sup>m</sup>-<sup>1</sup> *<sup>n</sup>* is called the *multiplicative inverse* of m in R*<sup>n</sup>* and is used to define a further commutative ring with identity R*<sup>m</sup> <sup>n</sup>* = (N*n*, <sup>⊕</sup>, <sup>i</sup>*n*, <sup>⊗</sup>, <sup>0</sup>, m mod n) where multiplication is defined by <sup>a</sup> <sup>⊗</sup> <sup>b</sup> <sup>=</sup> <sup>a</sup> <sup>b</sup> <sup>m</sup>-<sup>1</sup> *<sup>n</sup>* and identity given as m mod n. The multiplication <sup>⊗</sup> of <sup>R</sup>*<sup>m</sup> <sup>n</sup>* is called *Montgomery Multiplication*.

The rings R*<sup>n</sup>* and R*<sup>m</sup> <sup>n</sup>* are isomorphic by the isomorphism <sup>h</sup> : <sup>R</sup>*<sup>n</sup>* <sup>→</sup> <sup>R</sup>*<sup>m</sup> n* defined by <sup>h</sup>(a) = <sup>a</sup> <sup>m</sup> and <sup>h</sup>−<sup>1</sup> : <sup>R</sup>*<sup>m</sup> <sup>n</sup>* <sup>→</sup> <sup>R</sup>*<sup>n</sup>* given by <sup>h</sup>−<sup>1</sup>(a) = <sup>a</sup> <sup>m</sup>-<sup>1</sup> *n* . Consequently <sup>a</sup> · b mod n can be calculated in ring <sup>R</sup>*<sup>m</sup> <sup>n</sup>* as well because

$$a \cdot b \bmod n = a \odot b = h^{-1}(h(a \odot b)) = h^{-1}(h(a) \otimes h(b)). \tag{\*}$$

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 505–522, 2018. https://doi.org/10.1007/978-3-319-96142-2\_30

```
function redc(x, z, m, n:N):N <=
if m -
     = 0
then let q := (x + n · (x · z mod m))/m in
       if n > q then q else q − n end if
      end let
end if
function redc∗(x, z, m, n, j:N):N <=
if m -
     = 0
 then if n -
           = 0
       then if j = 0
              then m mod n
              else redc(x · redc∗(x, z, m, n,
                                             -
                                             (j)), z, m, n)
             end if
        end if
end if
```
**Fig. 1.** Procedures redc and redc<sup>∗</sup> implementing the Montgomery Reduction

The required operations h, <sup>⊗</sup> and <sup>h</sup>−<sup>1</sup> can be implemented by the so-called *Montgomery Reduction* redc [6] (displayed in Fig. 1) as stated by Theorem 1:

**Theorem 1.** *Let* a, b, n, m <sup>∈</sup> <sup>N</sup> *with* m>n>a*,* n>b *and* n, m *relatively prime, let* I = i*m*(n*-*<sup>1</sup> *<sup>m</sup>*) *and let* M = m<sup>2</sup> mod n*. Then* I *is called the* Montgomery Inverse *and (1)* h(a) = *redc*(a · M, I, m, n)*, (2)* a ⊗ b = *redc*(a · b, I, m, n)*, and (3)* h−<sup>1</sup>(a) = *redc*(a, I, m, n)*.*

By (∗) and Theorem 1, <sup>a</sup> · b mod n can be computed by procedure redc and consequently <sup>a</sup>*<sup>j</sup>* mod n can be computed by iterated calls of redc (implemented by procedure redc<sup>∗</sup> of Fig. 1) as stated by Theorem 2:

**Theorem 2.** *Let* a, n, m, I *and* <sup>M</sup> *like in Theorem 1. Then for all* <sup>j</sup> <sup>∈</sup> <sup>N</sup>: 1

<sup>a</sup>*<sup>j</sup>* mod n <sup>=</sup> *redc*(*redc*∗(*redc*(<sup>a</sup> · M, I, m, n), I, m, n, j), I, m, n).

By Theorem 2, <sup>j</sup> + 2 calls of redc are required for computing <sup>a</sup>*<sup>j</sup>* mod n, viz. one call to map a to h(a), j calls for the Montgomery Multiplications and one call for mapping the result back with h−<sup>1</sup>. This approach allows for an efficient computation of a*<sup>j</sup>* mod n in R*<sup>m</sup> <sup>n</sup>* (for sufficient large j), if m is chosen as a power of 2 and some odd number for n, because x mod m then can be computed with constant time and x/m only needs an effort proportional to *log* m in procedure redc, thus saving the expensive mod n operations in <sup>R</sup>*n*.

<sup>1</sup> Exponentiation is defined here with 0<sup>0</sup> = 1 so that *redc*(*redc*∗(*redc*(0 · *M, I, m, n*)*,I, m, n,* 0)*, I, m, n*)=1 *mod n* holds in particular.

#### **2 About eriFun**

The truth of Theorems 1 and 2 is not obvious at all, and some number theory with modular arithmetic is needed for proving them. Formal proofs are worthwhile because correctness of cryptographic methods is based on these theorems.

```
structure bool <= true, false
structure N <= 0,
                               +(
                                −:N)
structure signs <= '+', ' −'
structure Z <= [outfix]  : (sign:signs, [outfix]| : N)
structure triple[@T1, @T2, @T3] <= [outfix] -
                                    :  ( [postfix]
                                                1:@T1,
                             [postfix]
                                   2: @T2, [postfix]
                                               3: @T3 )
```
lemma z -<sup>=</sup> <sup>0</sup> <sup>→</sup> [x · (y mod z) <sup>≡</sup> <sup>x</sup> · y] mod z *<sup>&</sup>lt;*<sup>=</sup> <sup>∀</sup> <sup>x</sup>*,* <sup>y</sup>*,* <sup>z</sup>:<sup>N</sup> if{¬ <sup>z</sup> <sup>=</sup> <sup>0</sup>*,* (x · (y mod z) mod z) <sup>=</sup> (x · y mod z)*,*true}

Proof assistants like Isabelle/HOL, HOL Light, Coq, ACL2 and others have been shown successful for developing formal proofs in Number Theory (see e.g. [14]). Here we use the eriFun system<sup>2</sup> [7,10] to verify correctness of Montgomery Multiplication by proving Theorems 1 and 2. The system's object language consists of universal first-order formulas plus parametric polymorphism. Type variables may be instantiated with polymorphic types. Higher-order functions are not supported. The language provides principles for defining data structures, procedures operating on them, and for statements (called "lemmas") about the data structures and procedures. Unicode symbols may be used and function symbols can be written in out-, in-, pre- and postfix notation so that readability is increased by use of the familiar mathematical notation. Figure 2 displays some examples. The data structure bool and the data structure <sup>N</sup> for natural numbers built with the constructors 0 and <sup>+</sup>(...) for the successor function are the only predefined data structures in the system. -(...) is the selector of <sup>+</sup>(...) thus representing the predecessor function. Subsequently we need integers Z as well which we define in Fig. 2 as signed natural numbers. For instance, the expression '−', 42 is a data object of type Z, selector *sign* yields the sign of an integer (like '−' in the example), and selector |...| gives the absolute value of an integer (like 42 in the example). Identifiers preceded by @ denote type variables, and therefore polymorphic triples are defined in Fig. 2. The expression -42, '+', 47 , '−', 5 is an example of a data object of type *triple*[N,Z,Z]. The i *th* component of a triple is obtained by selector (...)*i*.

Procedures are defined by *if* - and *case*-conditionals, functional composition and recursion like displayed in Fig. 1. Procedure calls are evaluated eagerly,

<sup>2</sup> An acronym for "A Verifier for Functional Programs".

i.e. call-by-value. The use of incomplete conditionals like for redc and redc<sup>∗</sup> results in incompletely defined procedures [12]. Such a feature is required when working with polymorphic data structures but is useful for monomorphic data structures too as it avoids the need for stipulating artificial results, e.g. for <sup>n</sup>/0. Predicates are defined by procedures with result type bool. Procedure function[infix] <sup>&</sup>gt; (x, y:N):bool <sup>&</sup>lt;<sup>=</sup> ... for deciding the greater-than relation is the only predefined procedure in the system. Upon the definition of a procedure, eriFun' s automated termination analysis (based on the method of *Argument-Bounded Functions* [8,11]) is invoked for generating termination hypotheses which are sufficient for the procedure's termination and proved like lemmas. Afterwards induction axioms are computed from the terminating procedures' recursion structure to be on stock for future use.

Lemmas are defined with conditionals *if* : bool × bool × bool → bool as the main connective, but negation ¬ and *case*-conditionals may be used as well. Only universal quantification is allowed for the variables of a lemma. Figure 2 displays a lemma about (the elsewhere defined) procedure mod (computing the remainder function) which is frequently used in subsequent proofs. The string in the headline (between "lemma" and "<=") is just an identifier assigning a name to the lemma for reference and must not be confused with the statement of the lemma given as a boolean term in the lemma body. Some basic lemmas about equality and >, e.g. stating transitivity of = and >, are predefined in the system. Predefined lemmas are frequently used in almost every case study so that work is eased by having them always available instead of importing them from some proof library.

Lemmas are proved with the *HPL*-calculus (abbreviating *Hypotheses*, *Programs* and *Lemmas*) [10]. The most relevant proof rules of this calculus are *Induction*, *Use Lemma*, *Apply Equation*, *Unfold Procedure*, *Case Analysis* and *Simplification*. Formulas are given as sequents of form H,*IH* goal, where <sup>H</sup> is a finite set of *hypotheses* given as literals, i.e. negated or unnegated predicate calls and equations, *IH* is a finite set of *induction hypotheses* given as partially quantified boolean terms and goal is a boolean term, called the *goalterm* of the sequent. A deduction in the *HPL* -calculus is represented by a tree whose nodes are given by sequents. A lemma with body ∀ . . . goal is *verified* iff *(i)* the goalterm of each sequent at a leaf of the proof tree rooted in {}, {} goal equals *true* and *(ii)* each lemma applied by *Use Lemma* or *Apply Equation* when building the proof tree is *verified*. The base of this recursive definition is given by lemmas being proved without using other lemmas. Induction hypotheses are treated like *verified* lemmas, however being available only in the sequent they belong to.

The *Induction* rule creates the base and step cases for a lemma from an induction axiom. By choosing *Simplification*, the system's first-order theorem prover, called the *Symbolic Evaluator*, is started for rewriting a sequent's goalterm using the hypotheses and induction hypotheses of the sequent, the definitions of the data structures and procedures as well as the lemmas already *verified*. This reasoner is guided by heuristics, e.g. for deciding whether to use a procedure definition, for speeding up proof search by filtering out useless lemmas, etc. Equality reasoning is implemented by conditional term rewriting with *AC* matching, where the orientation of equations is heuristically established [13]. The Symbolic Evaluator is a fully automatic tool over which the user has no control, thus leaving the *HPL*-proof rules as the only means to guide the system to a proof.

Also the *HPL*-calculus is controlled by heuristics. When applying the *Verify* command to a lemma, the system starts to compute a proof tree by choosing appropriate *HPL*-proof rules heuristically. If a proof attempt gets stuck, the user must step in by applying a proof rule to some leaf of the proof tree (sometimes after pruning some unwanted branch of the tree), and the system then takes over control again. Also it may happen that a further lemma must be formulated by the user before the proof under consideration can be completed. All interactions are menu driven so that typing in proof scripts is avoided (see [7,10]).

eriFun is implemented in Java and installers for running the system under *Windows*, *Unix/Linux* or *Mac* are available from the web [7]. When working with the system, we use proof libraries which had been set up over the years by extending them with definitions and lemmas being of general interest. When importing a definition or a lemma from a library into a case study, all program elements and proofs the imported item depends on are imported as well. The correctness proofs for Montgomery Multiplication depend on 9 procedures and 96 lemmas from our arithmetic proof library, which ranges from simple statements like associativity and commutativity of addition up to more ambitious theorems about primes and modular arithmetic. In the sequel we will only list the lemmas which are essential to understand the proofs and refer to [7] for a complete account of all used lemmas and their proofs.

### **3 Multiplicative Inverses**

We start our development by stipulating how multiplicative inverses are computed. To this effect we have to define some procedure <sup>I</sup> : <sup>N</sup> <sup>×</sup> <sup>N</sup> <sup>→</sup> <sup>N</sup> satisfying<sup>3</sup>

$$\forall x, y \colon \mathbb{N} \ y \neq 0 \land \gcd(x, y) = 1 \to [x \cdot \Im(x, y) \equiv 1] \bmod y \tag{1}$$

$$\forall x, y, z \colon \mathbb{N} \ y \neq 0 \land \gcd(x, y) = 1 \to [z \cdot x \cdot \Im(x, y) \equiv z] \bmod y \tag{2}$$

$$\forall n, x, y, z \colon \mathbb{N} \ y \neq 0 \land \gcd(x, y) = 1 \rightarrow [n + z \cdot x \cdot \Im(x, y) \equiv n + z] \bmod y. \tag{3}$$

Lemma 2 is proved with Lemma 1 and library lemma

$$\forall n, m, x, y \colon \gcd{\gcd{n, m}} = 1 \land [m \cdot x \equiv m \cdot y] \bmod n \to [x \equiv y] \bmod n \tag{4}$$

after instructing the system to use library lemma

$$\forall x, y, z \colon z \neq 0 \to [x \cdot (y \bmod z) \equiv x \cdot y] \bmod z \tag{5}$$

<sup>3</sup> If *x, y, z* <sup>∈</sup> <sup>Z</sup> and *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>, then *<sup>n</sup>*|*<sup>z</sup>* abbreviates *z mod n* = 0, where *z mod n* <sup>=</sup> <sup>−</sup>(|*z*<sup>|</sup> *mod n*) if *z <* 0, and [*x* ≡ *y*] *mod n* stands for *n*|*x* − *y*. *x mod n* = *y mod n* is sufficient for [*x* ≡ *y*] *mod n* but only necessary, if *x* and *y* have same polarity.

and eriFun proves Lemma 3 automatically using Lemma 2 as well as library lemma

$$\forall n, x, y, z \colon z \nvD \; z \neq 0 \land [x \equiv y] \; mod \; z \to [x + n \equiv y + n] \; mod \; z. \tag{6}$$

Multiplicative inverses can be computed straightforwardly with Euler's φfunction, where Lemma 1 then is proved with Euler's Theorem [7,14]. But this approach is very costly and therefore unsuitable for an implementation of Montgomery Multiplication.

```
function euclid(x, y:N):triple[N, Z, Z] <=
if y = 0
then -
       x, '+', 1, '+', 0 
else let e := euclid(y, (x mod y)), g := (e)1, s := (e)2, t := (e)3 in
       case sign(s) of
        '+' : -
               g, '–', |t|, '+', |s| + (x/y) · |t| ,
        '–': -
              g, '+', |t|, '–', |s| + (x/y) · |t| 
       end case
     end let
end if
function IB(x, y:N):N <=
if y -
    = 0
then let s := (euclid(x, y))2 in
       case sign(s) of '+' : (|s| mod y), '–' : y − (|s| mod y) end case
      end let
end if
```
**Fig. 3.** Computation of multiplicative inverses by the extended Euclidean algorithm

### **3.1 B´ezout's Lemma**

A more efficient implementation of procedure I is based on B´ezout's Lemma stating that the greatest common divisor can be represented as a linear combination of its arguments:

### **B´ezout's Lemma**

*For all* x, y <sup>∈</sup> <sup>N</sup> *some* s, t <sup>∈</sup> <sup>Z</sup> *exist such that* gcd(x, y) = <sup>x</sup> · <sup>s</sup> <sup>+</sup> <sup>y</sup> · <sup>t</sup>.

If y = 0, I*B*(x, y) := s mod y is defined and gcd(x, y) = 1 holds, then by B´ezout's Lemma [x · I*B*(x, y) = x ·(s mod y) ≡ x · s ≡ x · s + y ·t = 1] mod y. To implement this approach, the integer s need to be computed which can be performed by the extended Euclidean algorithm displayed in Fig. 3. This approach is more efficient because a call of euclid(x, y) (and in turn of I*B*(x, y) given as in Fig. 3) can be computed in time proportional to (*log* y)<sup>2</sup> if x<y, whereas the use of Euler's φ-function needs time proportional to 2*log <sup>y</sup>* in the context of Montgomery Multiplication (as φ(2*<sup>k</sup>*+1)=2*<sup>k</sup>*).

However, <sup>s</sup> <sup>∈</sup> <sup>Z</sup> might be negative so that <sup>y</sup> + (s mod y) <sup>∈</sup> <sup>N</sup> instead of s mod y then must be used as the multiplicative inverse of x because the carriers lemma B´ezout's Lemma #<sup>1</sup> *<sup>&</sup>lt;*<sup>=</sup> <sup>∀</sup> x, y : <sup>N</sup> let e := euclid(x*,* y)*,* g := (e)1*,* s := (e)2*,* t := (e)<sup>3</sup> in case sign(s) of '+' : <sup>x</sup> · |s<sup>|</sup> = y · |t<sup>|</sup> + g*,* '–' : <sup>x</sup> · |s<sup>|</sup> + g =y · |t<sup>|</sup> end case end let (7)

lemma B´ezout's Lemma #<sup>2</sup> *<sup>&</sup>lt;*<sup>=</sup> <sup>∀</sup> x, y : <sup>N</sup> (euclid(x, y))<sup>1</sup> <sup>=</sup> gcd(x, y) . (8)

# **Fig. 4.** B´ezout's Lemma

of the rings R*<sup>n</sup>* and R*<sup>m</sup> <sup>n</sup>* are subsets of N. We therefore define I*<sup>B</sup>* as shown in Fig. 3 which complicates the proof of Lemma 1 (with I replaced by I*B*) as this definition necessitates a proof of [x · y + x · (s mod y) ≡ 1] mod y if s < 0.

B´ezout's Lemma is formulated in our system's notation by the pair of lemmas displayed in Fig. 4. When prompted to prove Lemma 7, the system starts a Peano induction upon x but gets stuck in the step case. We therefore command to use induction corresponding to the recursion structure of procedure euclid. eriFun responds by proving the base case and simplifying the induction conclusion in case *sign*(s) = '+' to

$$y \neq 0 \rightarrow x \cdot |t| + g = (x \bmod y) \cdot |t| + g + |t| \cdot (y - 1) \cdot (x/y) + |t| \cdot (x/y) \tag{i}$$

(where e abbreviates *euclid*(y,(x mod y)), g := (*e*)1, s := (*e*)<sup>2</sup> and t := (*e*)3) using the induction hypothesis

$$\begin{aligned} \forall x': \mathbb{N} \,\text{let} \{e := \text{euclid}(x', (x \bmod y)), \,\, g := (e)\_1, \,\, s := (e)\_2, \,\, t := (e)\_3;\\ \text{case} \{\text{sign}(s);\\ \text{'+'} : \, x' \cdot |\, s \,| = (x \bmod y) \cdot |\, t \,| + g, \\ \text{'-'} : \, x' \cdot |\, s \,| + g = (x \bmod y) \cdot |\, t \,| \} \end{aligned}$$

and some basic arithmetic properties. We then instruct the system to use the quotient-remainder theorem for replacing x at the left-hand side of the equation in (i) by (x/y)·y+(x mod y) causing eriFun to complete the proof. The system computes a similar proof obligation for case *sign*(s)='−' which is proved in the same way.

By "basic arithmetic properties" we mean well known facts like associativity, commutativity, distributivity, cancellation properties etc. of +, −, ·, /, *gcd*,... which are defined and proved in our arithmetic proof library. These facts are used almost everywhere by the Symbolic Evaluator so that we will not mention their use explicitly in the sequel.

When called to prove Lemma 8 by induction corresponding to the recursion structure of procedure euclid, eriFun responds by proving the base case and rewrites the step case with the induction hypothesis to

$$y \neq 0 \to \gcd(x, y) = \gcd(y, (x \mod y)).\tag{ii}$$

It then automatically continues with proving (ii) by induction corresponding to the recursion structure of procedure gcd where it succeeds for the base and the step case. Lemma <sup>8</sup> is useful because it relates procedure euclid to procedure gcd of our arithmetic proof library so that all lemmas about gcd can be utilized for the current proofs.

For proving the inverse property

$$\forall x, y: \mathbb{N} \ y \neq 0 \land \gcd(x, y) = 1 \to [x \cdot \Im\_B(x, y) \equiv 1] \mod y \tag{9}$$

of procedure <sup>I</sup>*B*, we call the system to unfold procedure call <sup>I</sup>*B*(x, y). eriFun responds by proving the statement for case *sign*(s) = '+' using B´ezout's Lemma 7 and 8 and the library lemmas

$$\forall x, y, z \colon \mathbb{N} \; z \neq 0 \land z \mid x \to [x + y \equiv y] \; mod \; z \tag{10}$$

$$\forall x, y: \mathbb{N} \ y \neq 0 \to y \mid x \cdot y \tag{11}$$

as well as (5), but gets stuck in the remaining case with proof obligation

$$x \ne 0 \land \operatorname{sign}(s) = \, ^\cdot - ^\cdot \land \; g = 1 \to [x \cdot y - x \cdot (|s| \bmod y) \equiv 1] \bmod y \tag{iii}$$

where g abbreviates (*euclid*(x, y))<sup>1</sup> and s stands for (*euclid*(x, y))2. Proof obligation (iii) represents the unpleasant case of the proof development and necessitates the invention of an auxiliary lemma for completing the proof. After some unsuccessful attempts, we eventually came up with lemma

$$\forall x, y, z, u \colon \mathbb{N} \, y \neq 0 \land y \mid (x \cdot z + u) \land x \ge u \to [x \cdot y - x \cdot (z \bmod y) \equiv u] \, mod \, y. \tag{12}$$

For proving (iii), we command to use Lemma 12 for replacing the left-hand side of the congruence in (iii) by <sup>g</sup>, and eriFun computes

$$\begin{aligned} y \neq 0 \land \operatorname{sign}(s) &= \, ^\circ - \, ^\circ \land \, \begin{aligned} g &= 1 \to \\ (x \ge g \to y \mid (x \cdot |s| + g)) \wedge \\ (x < g \to [x \cdot y - x \cdot (|s| \bmod y) \equiv 1] \bmod y \end{aligned} \tag{iv} \end{aligned}$$

Now we can call the system to use B´ezout's Lemma 7 for replacing x · |s| + g in (iv) by <sup>y</sup> · |t<sup>|</sup> causing eriFun to complete the proof with B´ezout's Lemma 8 and library lemma (11) in case of x ≥ g and otherwise showing that x<g = 1 entails x = 0 and 1 = g = gcd(0, y) = y in turn, so that x · y − x · (|s| mod y) simplifies to 0 and [0 ≡ 1] mod y rewrites to *true*.

It remains to prove auxiliary lemma (12) for completing the proof of Lemma 9: After being called to use library lemma<sup>4</sup>

$$\forall x, y, z \colon \mathbb{N} \, z \neq 0 \land z \mid (x - y) \land z \mid (y - x) \to [x \equiv y \mid mod \, z \tag{13}$$

<sup>4</sup> At least one of *<sup>z</sup>*|(*<sup>x</sup>* <sup>−</sup> *<sup>y</sup>*) or *<sup>z</sup>*|(*<sup>y</sup>* <sup>−</sup> *<sup>x</sup>*) holds trivially because subtraction is defined such that *a* − *b* = 0 iff *a* ≤ *b*.

for replacing the left-hand side of the congruence in (12) by <sup>u</sup>, eriFun computes

$$y \neq 0 \land y \mid (x \cdot z + u) \land x \ge u \to y \mid (u - (x \cdot y - x \cdot (z \bmod y))) \tag{v}$$

with the library lemmas (11) and

$$\forall x, y, z \colon \mathbb{N} \; z \neq 0 \land [x \equiv y] \; mod \; z \to z \mid (x - y) \tag{14}$$

$$\forall x, y, z, n \colon \mathbb{N} \, n \neq 0 \to \left[ x + y \cdot (z \bmod n) \equiv x + y \cdot z \right] \bmod n. \tag{15}$$

We then command to use library lemma <sup>∀</sup>x, y, z:<sup>N</sup> <sup>z</sup> = 0 <sup>∧</sup> <sup>x</sup> <sup>≤</sup> <sup>y</sup> <sup>→</sup> <sup>x</sup> <sup>≤</sup> <sup>y</sup> · <sup>z</sup> (with u substituted for x, x for y and y − (z mod y) for z) after x factoring out, causing eriFun to prove (v) with the synthesized lemma<sup>5</sup>

$$\forall x, y \colon \mathbb{N} \ y \neq 0 \to y > (x \bmod y). \tag{16}$$

function I*<sup>N</sup>*- (x*,* k:N):N *<*= if 2 *>* k then k else let h := k*/*<sup>2</sup> ; <sup>r</sup> := <sup>I</sup>*<sup>N</sup>*- ((x mod 2 <sup>↑</sup> h)*,* h); <sup>y</sup> := <sup>2</sup> <sup>↑</sup> k in (2 · <sup>r</sup> <sup>+</sup> ((r · r mod y) · x mod y) mod y) end let end if function <sup>I</sup>*<sup>N</sup>* (x*,* <sup>y</sup>:N):<sup>N</sup> *<sup>&</sup>lt;*<sup>=</sup> if y -<sup>=</sup> 0 then y <sup>−</sup> <sup>I</sup>*<sup>N</sup>*-(x*,* log2(y)) end if

**Fig. 5.** Computation of multiplicative inverses by Newton-Raphson iteration

### **3.2 Newton's Method**

Newton-Raphson iteration is a major tool in arbitrary-precision arithmetic and efficient algorithms for computing multiplicative inverses are developed in combination with Hensel Lifting [2]. Figure 5 displays an implementation by procedure I*<sup>N</sup>* for odd numbers x and powers y of 2 (where ↑ computes exponentiation satisfying 0 ↑ 0 = 1). Procedure I*<sup>N</sup>* is defined via procedure I*<sup>N</sup>* which is obtained from [3], viz. Algorithm 2 *Recursive Hensel*, where however '−' instead of '+' is used in the result term. Algorithm 2 was developed to compute a multiplicative inverse of x modulo p*<sup>k</sup>* for any x not dividable by a prime p and returns a negative integer in most cases. By replacing '−' with '+', all calculations can be kept within N so that integer arithmetic is avoided. As procedure I*<sup>N</sup>* computes the absolute value of a negative integer computed by Algorithm 2 , one additional subtraction is needed to obtain a multiplicative inverse which is implemented by procedure I*<sup>N</sup>* . The computation of I*<sup>N</sup>* (x, 2*<sup>k</sup>*) only requires *log* k steps (compared to k<sup>2</sup> steps for I*B*(x, 2*<sup>k</sup>*)), and therefore I*<sup>N</sup>* is the method of choice for computing a Montgomery Inverse.

<sup>5</sup> Synthesized lemmas are a spin-off of the system's termination analysis.

However, Algorithm 2 is flawed so that we wasted some time with our verification attempts: The four mod-calls in the algorithm are not needed for correctness, but care for efficiency as they keep the intermediate numbers small. Now instead of using modulus 2*<sup>k</sup>* for both inner mod-calls, Algorithm 2 calculates mod 2*k/*2 thus spoiling correctness. As the flawed algorithm cares for even smaller numbers, the use of mod 2*k/*2 could be beneficial indeed, and therefore it was not obvious to us whether we failed in the verification only because some mathematical argumentation was missing. But this consideration put us on the wrong track. Becoming eventually frustrated by the unsuccessful verification attempts, we started eriFun's *Disprover* [1] which—to our surprise—came up with the counter example x = 3, k = 2 for Lemma 17 in less than a second.<sup>6</sup> We then repaired the algorithm as displayed in Fig. 5 and subsequently verified it (cf. Lemma 20). Later we learned that the fault in Algorithm 2 has not been recognized so far and that one cannot do better to patch it as we did.<sup>7</sup>

For proving the inverse property (20) of procedure I*<sup>N</sup>* , we first have to verify the correctness statement

$$\forall x, k . \mathbb{N} \, 2 \nmid x \to (x \cdot \mathfrak{I}\_{N'}(x, k) \bmod 2^k) = 2^k - 1 \tag{17}$$

for procedure I*<sup>N</sup>*- : We call the system to use induction corresponding to the recursion structure of procedure I*<sup>N</sup>*which provides the induction hypothesis

$$\forall x' \colon \mathbb{N} \; k \ge 2 \land 2 \nmid x' \to (x' \cdot \Im\_{N'}(x', [k/2]) \; mod \; 2^{\lceil k/2 \rceil}) = 2^{\lceil k/2 \rceil} - 1. \tag{18}$$

eriFun proves the base case, but gets stuck in the step case with

$$\begin{aligned} &k \ge 2 \land 2 \nmid x \to \\ &(x \cdot (2A + (x \cdot (A^2 \bmod 2^k) \bmod 2^k) \bmod 2^k) \bmod 2^k) \end{aligned} $$

where A stands for I*<sup>N</sup>*- ((x mod <sup>2</sup>*k/*2), k/2). By prompting the system to use Lemma 5, proof obligation (i) is simplified to

$$\{k \ge 2 \land 2 \nmid x \to (2B + B^2 \bmod 2^k) \; = 2^k - 1\tag{ii}$$

(where B abbreviates x · A) thus eliminating the formal clutter resulting from the mod-calls in procedure I*<sup>N</sup>*- . Next we replace 2<sup>B</sup> <sup>+</sup> <sup>B</sup><sup>2</sup> by (<sup>B</sup> + 1)<sup>2</sup> <sup>−</sup> 1 and then call the system to replace <sup>B</sup> by (B/C) · <sup>C</sup> <sup>+</sup> <sup>R</sup> where <sup>C</sup> = 2*k/*2 and <sup>R</sup> <sup>=</sup> ((x mod C)· A mod C), which is justified by the quotient-remainder theorem as R rewrites to (B mod C) by library lemma (5). This results in proof obligation

$$k \ge 2 \land 2 \nmid x \to (((B/C) \cdot C + R + 1)^2 - 1 \mod 2^k) = 2^k - 1 \tag{iii}$$

<sup>6</sup> The Disprover is based on two heuristically controlled disproving calculi, and its implementation provides four selectable execution modes (Fast Search, Extended Search, Simple Terms and Structure Expansion). For difficult problems, the user may support the search for counter examples by presetting some of the universally quantified variables with general terms or concrete values.

<sup>7</sup> Personal communication with Jean-Guillaume Dumas.

and we command to use the induction hypothesis (18) for replacing R in (iii) by <sup>C</sup> <sup>−</sup> 1. eriFun then responds by computing

$$k \ge 2 \land 2 \nmid x \to (((B/C) \cdot C + C)^2 - 1 \mod 2^k) = 2^k - 1 \tag{iv}$$

using library lemmas <sup>∀</sup>x, y, z:<sup>N</sup> <sup>y</sup> = 0∧<sup>z</sup> = 0∧<sup>z</sup> <sup>|</sup> <sup>y</sup> <sup>→</sup> [(x mod y) <sup>≡</sup> <sup>x</sup>] mod z and (5) to prove 2 (x mod 2*k/*2) for justifying the use of the induction hypothesis.

When instructed to factor out C in (iv), the system computes

$$k \ge 2 \land 2 \nmid x \to ((2^{\lceil k/2 \rceil})^2 \cdot (B/C + 1)^2 - 1 \mod 2^k) = 2^k - 1. \tag{v}$$

We command to use library lemma

$$\forall x, y, z \text{:} \mathbb{N} \; z \neq 0 \land z \; \middle\{x \land z \mid y \land y \ge x \to (y - x \, mod \, z) = z - (x \, mod \, z) \quad \text{(19)}$$

for replacing the left-hand side of the equation in (v) yielding

$$k \ge 2 \land 2 \nmid x \to 2^k - \text{(1 } mod \ 2^k) = 2^k - 1 \tag{vi}$$

justified by proof obligation

$$\begin{aligned} &k \ge 2 \land 2 \nmid x \to \\ &2^k \ne 0 \land 2^k \nmid 1 \land 2^k \mid (2^{\lceil k/2 \rceil})^2 \cdot (B/C + 1)^2 \land (2^{\lceil k/2 \rceil})^2 \cdot (B/C + 1)^2 \ge 1 \end{aligned}$$

which eriFun simplifies to

$$k \ge 2 \land 2 \nmid x \to 2^k \mid (2^{\lceil k/2 \rceil})^2 \cdot (B/C + 1)^2 \tag{vii}$$

in a first step. It then uses auxiliary lemma <sup>∀</sup>x:<sup>N</sup> <sup>x</sup> <sup>≤</sup> <sup>2</sup> · x/2 and the library lemmas (11) and <sup>∀</sup>x, y, z:<sup>N</sup> <sup>x</sup> = 0 <sup>∧</sup> <sup>z</sup> <sup>≤</sup> <sup>y</sup> <sup>→</sup> <sup>x</sup>*<sup>z</sup>* <sup>|</sup> <sup>x</sup>*<sup>y</sup>* for rewriting (vii) subsequently to true. Finally the system simplifies (vi) to true as well by unfolding the call of procedure mod, and Lemma 17 is proved.

When called to verify the inverse property

$$\forall x, y: \mathbb{N} \, 2 \nmid x \land 2^{?} (y) \to [x \cdot \mathfrak{I}\_N (x, y) \equiv 1] \mod y \tag{20}$$

of procedure <sup>I</sup>*<sup>N</sup>* (where 2?(y) decides whether <sup>y</sup> is a power of 2), eriFun unfolds the call of procedure I*<sup>N</sup>* and returns

$$y \ge 2 \land 2 \nmid x \land 2^{\mathcal{I}}(y) \to (x \cdot y - x \cdot \mathfrak{I}\_{N'}(x, \log\_2(y)) \bmod y) = 1. \tag{viii}$$

Now we instruct the system to use library lemma (19) for replacing the left-hand side of the equation in (viii), and eriFun computes

$$\begin{aligned} &y \ge 2 \land 2 \nmid x \land 2^{\text{?}} (y) \to \\ & (x \cdot \mathfrak{I}\_{N'} (x, \log\_2(y)) \bmod y) \ne 0 \land y - (x \cdot \mathfrak{I}\_{N'} (x, \log\_2(y)) \bmod y) = 1 \quad \text{(ix)} \end{aligned}$$

using auxiliary lemma <sup>∀</sup>x, y:<sup>N</sup> <sup>2</sup>?(y) <sup>→</sup> y > <sup>I</sup>*<sup>N</sup>*- (x, log2(y)) and the library lemmas (11), (14) and

$$\forall x, y, z \colon \mathbb{N} \ x \cdot y > x \cdot z \to y > z. \tag{21}$$

Finally we let the system use library lemma <sup>∀</sup>x:<sup>N</sup> <sup>2</sup>?(x) <sup>→</sup> <sup>2</sup>*log*2(*x*) <sup>=</sup> <sup>x</sup> to replace both moduli <sup>y</sup> in (ix) by 2*log*2(*y*) causing eriFun to rewrite both occurrences of (x · I*<sup>N</sup>*- (x, log2(y)) mod y) with Lemma 17 to y − 1 and proof obligation (ix) to true in turn, thus completing the proof of (20).

function <sup>i</sup>(x*,* <sup>y</sup>:N):<sup>N</sup> *<sup>&</sup>lt;*= if y -<sup>=</sup> 0 then (x · - (y) mod y) end if function *<sup>h</sup>*(x*,* <sup>m</sup>*,* <sup>n</sup>:N):<sup>N</sup> *<sup>&</sup>lt;*= if n -<sup>=</sup> 0 then (x · m mod n) end if function <sup>⊗</sup>(x*,* <sup>y</sup>*,* <sup>m</sup>*,* <sup>n</sup>:N):<sup>N</sup> *<sup>&</sup>lt;*= if n -<sup>=</sup> 0 then (x · <sup>y</sup> · <sup>I</sup>(m*,* n) mod n) end if function *<sup>h</sup>*−<sup>1</sup>(x*,* <sup>m</sup>*,* <sup>n</sup>:N):<sup>N</sup> *<sup>&</sup>lt;*= if n -<sup>=</sup> 0 then (x · <sup>I</sup>(m*,* n) mod n) end if function I(x*,* y:N):N *<*= if 2? (y) then I*<sup>N</sup>* (x*,* y) else I*B*(x*,* y) end if

**Fig. 6.** Procedures for verifying Montgomery Multiplication

# **4 Correctness of Montgomery Multiplication**

We continue by defining procedures for computing the functions <sup>i</sup>, h, <sup>⊗</sup> and <sup>h</sup>−<sup>1</sup> as displayed in Fig. 6, where we write i(x, y) instead of i*y*(x) in the procedures and lemmas. As we aim to prove correctness of Montgomery Multiplication using procedure I*<sup>N</sup>* for computing the Montgomery Inverse with minimal costs, <sup>2</sup> <sup>n</sup> <sup>∧</sup>2?(m) instead of gcd(n, m) = 1 must be demanded to enable the use of Lemma 20 when proving the statements of Theorems 1 and 2. However, the multiplicative inverses n-<sup>1</sup> *<sup>m</sup>* and <sup>m</sup>-<sup>1</sup> *<sup>n</sup> both* are needed in the *proofs* (whereas only n-<sup>1</sup> *<sup>m</sup>* is used in *applications* of redc and redc<sup>∗</sup>). Consequently procedure <sup>I</sup>*<sup>N</sup>* cannot be used in the proofs as it obviously fails in computing m-<sup>1</sup> *<sup>n</sup>* (except for case n = m = 1, of course). This problem does not arise if procedure I*<sup>B</sup>* is used instead, where gcd(n, m) = 1 is demanded, because I*B*(n, m) = n-<sup>1</sup> *<sup>m</sup>* and I*B*(m, n) = m-<sup>1</sup> *<sup>n</sup>* for any coprimes n and m by Lemma 9. The replacement of I*<sup>B</sup>* by I*<sup>N</sup>* when computing the Montgomery Inverse then must be justified afterwards by additionally proving

$$\forall x, y; \mathbb{N}\ 2 \nmid x \land 2^?(y) \to \Im\_B(x, y) = \Im\_N(x, y). \tag{22}$$

However, proving (22) would be a complicated and difficult enterprise because the recursion structures of procedures euclid and <sup>I</sup>*<sup>N</sup>* differ significantly. But we can overcome this obstacle by a simple workaround: We use procedure I of Fig. 6 instead of I*<sup>B</sup>* in the proofs and let the system verify the inverse property

$$\forall x, y \colon \mathbb{N} \ y \neq 0 \land \gcd(x, y) = 1 \to [x \cdot \Im(x, y) \equiv 1] \mod y \tag{i}$$

of procedure <sup>I</sup> before: eriFun easily succeeds with library lemma (4) and the inverse property (9) of procedure I*<sup>B</sup>* after being instructed to use library lemma <sup>∀</sup>x, y, n:<sup>N</sup> <sup>n</sup> <sup>≥</sup> <sup>2</sup> <sup>∧</sup> <sup>n</sup> <sup>|</sup> <sup>y</sup> <sup>∧</sup> gcd(x, y)=1 <sup>→</sup> <sup>n</sup> <sup>x</sup> and the inverse property (20) of procedure I*<sup>N</sup>* . Consequently I(n, m) = n-<sup>1</sup> *<sup>m</sup>* and <sup>I</sup>(m, n) = <sup>m</sup>-<sup>1</sup> *<sup>n</sup>* for any coprimes n and m, and therefore I can be used in the proofs. The use of I*<sup>N</sup>* instead of I when computing the Montgomery Inverse is justified afterwards with lemma

$$\forall x, y \colon \mathbb{N} \; 2^{\mathbb{P}}(y) \to \Im(x, y) = \Im\_N(x, y)$$

having an obviously trivial (and automatic) proof.

Central for the proofs of Theorems 1 and 2 is the key property

<sup>∀</sup>m, n, x:<sup>N</sup> m>n <sup>∧</sup> <sup>n</sup> · m>x <sup>∧</sup> gcd(n, m)=1 <sup>→</sup> *redc*(x, <sup>i</sup>(I(n, m), m), m, n)=(<sup>x</sup> · <sup>I</sup>(m, n) mod n) (23)

of procedure redc: For proving Theorem 1.1

$$\begin{aligned} \forall m, n, a; &\mathbb{N} \, m > n > a \land \gcd(n, m) = 1 \to \\ &h(a, m, n) = red(a \cdot (m \cdot m \bmod n), \mathrm{i}(\Im(n, m), m), m, n) \end{aligned} \tag{\text{Thm 1.1}}$$

we command to use (23) for replacing the right-hand side of the equation by (a ·(m · m mod n)· I(m, n) mod n). The system then replaces the left-hand side of the equation with a · m mod n by unfolding procedure call h(a, m, n) and simplifies the resulting equation to *true* with Lemma 2, the synthesized lemma (16) and the library lemmas (5) and

$$\forall x, y, u, v; \mathbb{N} \; x > y \land u > v \to x \cdot u > y \cdot v. \tag{24}$$

Theorems 1.2 and 1.3, viz.

$$\begin{aligned} \forall m, n, a, b; &\mathbb{N} \, m > n > a \land n > b \land \gcd(n, m) = 1\\ \rightarrow \otimes (a, b, m, n) &= \operatorname{red}(a \cdot b, \operatorname{i}(\Im(n, m), m), m, n) \end{aligned} \tag{\text{Thm 1.2}}$$

$$\begin{aligned} \forall m, n, a; &\mathbb{N} \, m > n > a \land \gcd(n, m) = 1\\ &\to h^{-1}(a, m, n) = \operatorname{red}(a, \operatorname{i}(\Im(n, m), m), m, n) \end{aligned} \tag{\text{Thm 1.3}}$$

are (automatically) proved in the same way.

Having proved Theorem 1, it remains to verify the key property (23) for procedure redc (before we consider Theorem <sup>2</sup> subsequently). We start by proving that division by m in R*<sup>n</sup>* can be expressed by I: We call the system to prove

$$\forall m, n, x; \mathbb{N} \; n \neq 0 \land m \mid x \land \gcd(n, m) = 1 \rightarrow [x/m \equiv x \cdot \Im(m, n)] \bmod n \tag{25}$$

and eriFun automatically succeeds with Lemma 2 and the library lemmas (4) and <sup>∀</sup>x, y, z:N<sup>y</sup> = 0 <sup>∧</sup> <sup>y</sup> <sup>|</sup> <sup>x</sup> <sup>→</sup> (x/y) · <sup>y</sup> <sup>=</sup> <sup>x</sup>.

As a consequence of Lemma 25, the quotient <sup>q</sup> in procedure redc can be expressed in <sup>R</sup>*<sup>n</sup>* by <sup>I</sup> in particular (if redc is called with the Montgomery Inverse as actual parameter for the formal parameter z), which is stated by lemma

$$\begin{aligned} \forall m, n, x: &\mathbb{N} \; n \neq 0 \land \gcd(n, m) = 1\\ &\rightarrow \left[ (x + n \cdot (x \cdot \text{i}(\Im(n, m), m) \bmod m)) / m \equiv x \cdot \Im(m, n) \right] \bmod n. \end{aligned} \tag{26}$$

For obtaining a proof, we command to use Lemma 25 for replacing the left-hand side of the congruence in (26) by (x + n · (x · i(I(n, m), m) mod m)) · I(m, n) causing eriFun to complete the proof using Lemma 3 as well as the library lemmas (5), (10), (11), (15) and <sup>∀</sup>x, y:<sup>N</sup> <sup>y</sup> = 0 <sup>→</sup> <sup>y</sup> <sup>|</sup> (<sup>x</sup> + (<sup>y</sup> <sup>−</sup> 1) · <sup>x</sup>).

An obvious correctness demand for the method is that each call of redc (under the given requirements) computes some element of the residue class mod n. This is guaranteed by the conditional subtraction of n from the quotient q in the body of procedure redc. However, at most one subtraction of <sup>n</sup> from <sup>q</sup> results in the desired property only if n + n>q holds, which is formulated by lemma

$$\forall m, n, x; \mathbb{N} \, m \cdot n > x \to n + n > (x + n \cdot (x \cdot \text{i}(\Im(n, m), m) \bmod m))/m. \tag{27}$$

We prompt the system to use a case analysis upon m · (n + n) > x + n · (x · <sup>i</sup>(I(n, m), m) mod m) causing eriFun to prove the statement in the positive case with the library lemmas (5) and <sup>∀</sup>x, y, z:<sup>N</sup> <sup>x</sup>·z>y <sup>→</sup> x > y/z and to verify it in the negative case with the synthesized lemma (16) and the library lemmas (5), (21) and <sup>∀</sup>x, y, u, v:<sup>N</sup> x>y <sup>∧</sup> <sup>u</sup> <sup>≥</sup> <sup>v</sup> <sup>→</sup> <sup>x</sup> <sup>+</sup> u>y <sup>+</sup> <sup>v</sup>.

Now the mod n property of procedure redc can be verified by proving lemma

$$\begin{aligned} \forall m, n, x; &\mathbb{N} \; m > n \land n \cdot m > x \land \gcd(n, m) = 1 \rightarrow \\ \; \text{red}(x, \text{i}(\Im(n, m), m), m, n) &= (\text{red}(x, \text{i}(\Im(n, m), m), m, n) \mod n). \end{aligned} \tag{28}$$

We let the system unfold the call of procedure mod in (28) causing eriFun to use the synthesized lemma (16) for computing the simplified proof obligation

$$m > n \land n \cdot m > x \land \gcd(n, m) = 1 \to n > \text{red}(x, \text{i}(\Im(n, m), m), m, n). \tag{1}$$

Then we command to unfold the call of procedure redc which simplifies to

$$\begin{aligned} m &> n \wedge n \cdot m > x \wedge \gcd(n, m) = 1 \wedge \\ & (x + n \cdot (x \cdot \text{i}(\Im(n, m), m) \bmod m)) / m \ge n \\ & \to n > (x + n \cdot (x \cdot \text{i}(\Im(n, m), m) \bmod m)) / m - n. \end{aligned}$$

Finally we let the system use library lemma <sup>∀</sup>x, y, z:<sup>N</sup> x>y <sup>∧</sup> <sup>y</sup> <sup>≥</sup> <sup>z</sup> <sup>→</sup> <sup>x</sup> <sup>−</sup> z > y − z resulting in proof obligation

$$\begin{aligned} &m>n\wedge n\cdot m>x\wedge\gcd(n,m)=1\\ &\wedge\left(x+n\cdot(x\cdot\text{i(}\Im(n,m),m)\bmod{m})\right)/m\geq n\\ &[\ (n+n>(x+n\cdot(x\cdot\text{i(}\Im(n,m),m)\bmod{m}))/m\land\\ &\wedge(x+n\cdot(x\cdot\text{i(}\Im(n,m),m)\bmod{m}))/m\geq n\\ &\rightarrow (n+n)-n>(x+n\cdot(x\cdot\text{i(}\Im(n,m),m)\bmod{m}))/m-n\\ &\rightarrow n>(x+n\cdot(x\cdot\text{i(}\Im(n,m),m)\bmod{m}))/m-n\end{aligned}\tag{iiii}$$

which simplifies to

$$\begin{aligned} m &> n \wedge n \cdot m > x \wedge \gcd(n, m) = 1\\ &\wedge (x + n \cdot (x \cdot \text{i(\Im(n, m), m) \bmod m)}) / m \ge n\\ &\wedge (n + n) - n > (x + n \cdot (x \cdot \text{i(\Im(n, m), m) \bmod m)}) / m - n\\ &\to n > (x + n \cdot (x \cdot \text{i(\Im(n, m), m) \bmod m)}) / m - n \end{aligned}$$

by Lemma 27 and to *true* in turn using the plus-minus cancellation.

Now all lemmas for proving the key lemma (23) are available: We demand to use Lemma 28 for replacing the left-hand side of the equation in (23) by (*redc*(x, i(I(n, m), m), m, n) mod n) and to apply lemma (26) for replacing the right-hand side by ((x+n·(x·i(I(n, m), m) mod m))/m mod n) resulting in the simplified proof obligation

$$\begin{aligned} \left[ \left( m > n \land n \cdot m > x \land \gcd(n, m) = 1 \to \\ \left[ \operatorname{red}(x, \operatorname{i}(\mathfrak{I}(n, m), m), m, n) \equiv (x + n \cdot (x \cdot \operatorname{i}(\mathfrak{I}(n, m), m) \bmod m)) / m \right] \right) / m ] \bmod n. \end{aligned} \right] \tag{v}$$

Then we unfold the call of procedure redc causing the system to prove (v) with library lemma (5).

Having proved the key lemma (23), the proof of Theorem 2

$$\begin{aligned} \forall m, n, a, j: &\mathbb{N} \ m > n > a \land \gcd(n, m) = 1 \rightarrow \\ (a^j \mod n) &= \operatorname{red}(\operatorname{red} c^\*(\operatorname{red}(a \cdot M, I, m, n), I, m, n, j), I, m, n) \quad \text{(Thm 2)} \end{aligned}$$

(where M = ((m·m) mod n) and I = i(I(n, m), m)) is easily obtained by support of a further lemma, viz.

$$\begin{aligned} \forall m, n, a, j; \mathbb{N} \ m > n > a \land \gcd(n, m) = 1 \rightarrow\\ \quad (m \cdot a^j \bmod n) = \operatorname{red} c^\*(\operatorname{red}(a \cdot M, I, m, n), I, m, n, j). \end{aligned} \tag{29}$$

When called to use Peano induction upon <sup>j</sup> for proving (29), eriFun proves the base case and rewrites the step case with the induction hypothesis to

$$\begin{aligned} m > n > a \land \gcd(n, m) = 1 \land j \neq 0 \to\\ (m \cdot a^{j-1} \cdot a \bmod n) = \operatorname{red}(\operatorname{red}(a \cdot M, I, m, n) \cdot (m \cdot a^{j-1} \bmod n), I, m, n). \end{aligned} \quad \text{(vi)}$$

Then we command to replace both calls of redc with the key lemma (23) causing eriFun to succeed with the lemmas (2), (5), (16) and (24).

Finally the system proves (Thm 2) using lemmas (2), (5), (16), (29) and library lemma <sup>∀</sup>x, y, z:<sup>N</sup> <sup>x</sup> = 0 <sup>∧</sup> y>z <sup>→</sup> <sup>x</sup> · y>z after being prompted to use (Thm 1.3) for replacing the right-hand side of the equation in (Thm 2).

# **5 Discussion and Conclusion**

We presented machine assisted proofs verifying an efficient implementation of Montgomery Multiplication, where we developed the proofs ourselves as we are not aware of respective proofs published elsewhere. Our work also uncovered a serious fault in a published algorithm for computing multiplicative inverses based on Newton-Raphson Iteration [3], which could have dangerous consequences (particularly when used in cryptographic applications) if remained undetected.


**Fig. 7.** Proof statistics

Figure 7 displays the effort for obtaining the proofs (including all procedures and lemmas which had been imported from our arithmetic proof library). Column *Proc.* counts the number of user defined procedures (the recursively defined ones given in parentheses), *Lem.* is the number of user defined lemmas (the number of synthesized lemmas given in parentheses), and *Rules* counts the total number of *HPL*-proof rule applications, separated into user invoked (*User* ) and system initiated (*System*) ones (with the number of uses of *Induction* given in parentheses). Column *%* gives the automation degree, i.e. the ratio between *System* and *Rules*, *Steps* lists the number of first-order proof steps performed by the Symbolic Evaluator and *Time* displays the runtime of the Symbolic Evaluator.<sup>8</sup>

The first two rows show the effort for proving Lemmas 9 and 20 as illustrated in Sect. 3. As it can be observed from the numbers, verifying the computation of multiplicative inverses by Newton-Raphson Iteration is much more challenging for the system and for the user than the method based on B´ezout's Lemma. Row *Theorems 1 and 2* below displays the effort for proving Theorems 1 and 2 as illustrated in Sect. 4 (with the effort for the proofs of Lemmas 9 and 20 included).

The numbers in Fig. 7 almost coincide with the statistics obtained for other case studies in Number Theory performed with the system (see e.g. [14] and also [7] for more examples), viz. an automation degree of ∼85% and a success rate of ∼95% for the induction heuristic. All termination proofs (hence all required induction axioms in turn) had been obtained without user support, where 6 of the 12 recursively defined procedures, viz. mod, /, gcd, log<sup>2</sup>, euclid and <sup>I</sup>*<sup>N</sup>*- , do not terminate by structural recursion.<sup>9</sup> While an automation degree up to 100% can be achieved in mathematically simple domains, e.g. when sorting lists [7,9], values of 85% and below are not that satisfying when concerned with *automated* reasoning. The cause is that quite often elaborate ideas for developing a proof are needed in Number Theory which are beyond the ability of the system's heuristics guiding the proof search.<sup>10</sup> We also are not aware of other reasoning systems offering more machine support for obtaining proofs in this difficult domain.

<sup>8</sup> Time refers to running eriFun 3.5 under *Windows 7 Enterprise* with an INTEL Core i7-2640M 2.80 GHz CPU using Java 1.8.0 45.

<sup>9</sup> Procedure 2? (*...*) is not user defined, but synthesized as the *domain procedure* [12] of the incompletely defined procedure log2. <sup>10</sup> Examples are the use of the quotient-remainder theorem for proving (i) in Sect. 3.1

and (iii) in Sect. 3.2 which are the essential proof steps there although more complex proof obligations result.

From the user's perspective, this case study necessitated more work than expected, and it was a novel experience for us to spend some effort for verifying a very small and non-recursively defined procedure. The reason is that correctness of procedure redc depends on some non-obvious and tricky number theoretic principles which made it difficult to spot the required lemmas. In fact, almost all effort was spend for the invention of the auxiliary lemmas in Sect. 4 and of Lemma 12 in Sect. 3.1. Once the "right" lemma for verifying a given proof obligation eventually was found, its proof turned out to be a routine task. The proof of Lemma 17 is an exception as it required some thoughts to create it and some effort as well to lead the system (thus spoiling the proof statistics). Proof development was significantly supported by the system's *Disprover* [1] which (besides detecting the fault in Algorithm 2 ) often helped not to waste time with trying to prove a false conjecture, where the computed counterexamples provided useful hints how to debug a lemma draft.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Inner and Outer Approximating Flowpipes for Delay Differential Equations**

Eric Goubault, Sylvie Putot(B), and Lorenz Sahlmann

LIX, CNRS and Ecole Polytechnique, ´ Palaiseau, France putot@lix.polytechnique.fr

**Abstract.** Delay differential equations are fundamental for modeling networked control systems where the underlying network induces delay for retrieving values from sensors or delivering orders to actuators. They are notoriously difficult to integrate as these are actually functional equations, the initial state being a function. We propose a scheme to compute inner and outer-approximating flowpipes for such equations with uncertain initial states and parameters. Inner-approximating flowpipes are guaranteed to contain only reachable states, while outer-approximating flowpipes enclose all reachable states. We also introduce a notion of robust inner-approximation, which we believe opens promising perspectives for verification, beyond property falsification. The efficiency of our approach relies on the combination of Taylor models in time, with an abstraction or parameterization in space based on affine forms, or zonotopes. It also relies on an extension of the mean-value theorem, which allows us to deduce inner-approximating flowpipes, from flowpipes outerapproximating the solution of the DDE and its Jacobian with respect to constant but uncertain parameters and initial conditions. We present some experimental results obtained with our C++ implementation.

# **1 Introduction**

Nowadays, many systems are composed of networks of control systems. These systems are highly critical, and formal verification is an essential element for their social acceptability. When the components of the system to model are distributed, delays are naturally introduced in the feedback loop. They may significantly alter the dynamics, and impact safety properties that we want to ensure for the system. The natural model for dynamical systems with such delays is Delay Differential Equations (DDE), in which time derivatives not only depend on the current state, but also on past states. Reachability analysis, which involves computing the set of states reached by the dynamics, is a fundamental tool for the verification of such systems. As the reachable sets are not exactly computable, approximations are used. In particular, outer (also called over)-approximating flowpipes are used to prove that error states will never be reached, whereas inner (also called under)-approximating flowpipes are used to prove that desired states will actually be reached, or to falsify properties. We propose in this article a method to compute both outer- and inner-approximating flowpipes for DDEs.

We concentrate on systems that can be modeled as parametric fixed-delay systems of DDEs, where both the initial condition and right-hand side of the system depend on uncertain parameters, but with a unique constant and exactly known delay:

$$\begin{cases} \dot{z}(t) = f(z(t), z(t-\tau), \beta) & \text{if } t \in [t\_0 + \tau, T] \\ z(t) = z\_0(t, \beta) & \text{if } t \in [t\_0, t\_0 + \tau] \end{cases} \tag{1}$$

where the continuous vector of variables z belongs to a state-space domain D ⊆ <sup>R</sup><sup>n</sup>, the (constant) vector of parameters β belongs to the domain B ⊆ <sup>R</sup><sup>m</sup>, and f : D×D×B → D is C<sup>∞</sup> and such that Eq. (1) admits a unique solution<sup>1</sup> on the time interval [t0, T]. The initial condition is defined on <sup>t</sup> <sup>∈</sup> [t0, t<sup>0</sup> <sup>+</sup> <sup>τ</sup> ] by a function <sup>z</sup><sup>0</sup> : <sup>R</sup><sup>+</sup> ×B → D. The method introduced here also applies in the case when the set of initial states is given as the solution of an uncertain system of ODEs instead of being defined by a function. Only the initialization of the algorithm will differ. When several constant delays occur in the system, the description of the method is more complicated, but the same method applies.

*Example 1.* We will exemplify our method throughout the paper on the system

$$\begin{cases} \dot{x}(t) = -x(t) \cdot x(t - \tau) =: f\left(x(t), x(t - \tau), \beta\right) & t \in [0, T] \\ x(t) = x\_0(t, \beta) = (1 + \beta t)^2 & t \in [-\tau, 0] \end{cases}$$

We take β <sup>∈</sup> <sup>1</sup> 3 , 1 , which defines a family of initial functions, and we fix τ = 1.

This system is a simple but not completely trivial example, for which we have an analytical solution on the first time steps, as detailed in Example 4.

*Contributions and Outline.* In this work, we extend the method introduced by Goubault and Putot [16] for ODEs, to the computation of inner and outer flowpipes of systems of DDEs. We claim, and experimentally demonstrate with our prototype implementation, that the method we propose here for DDEs is both simple and efficient. Relying on outer-approximations and generalized interval computations, all computations can be safely rounded, so that the results are guaranteed to be sound. Finally, we can compute inner-approximating flowpipes combining existentially and universally quantified parameters, which offers some strong potential for property verification, beyond falsification.

In Sect. 2, we first define the notions of inner and outer-approximating flowpipes, as well as robust inner-approximations, and state some preliminaries on generalized interval computations, which are instrumental in our inner flowpipes computations. We then present in Sect. 3 our method for outer-approximating

<sup>1</sup> We refer the reader to [12,27] for the conditions on f.

solutions to DDEs. It is based on the combination of Taylor models in time with a space abstraction relying on zonotopes. Section 4 relies on this approach to compute outer-approximations of the Jacobian of the solution of the DDE with respect to the uncertain parameters, using variational equations. Innerapproximating tubes are obtained from these using a generalized mean-value theorem introduced in Sect. 2. We finally demonstrate our method in Sect. 5, using our C++ prototype implementation, and show its superiority in terms of accuracy and efficiency compared to the state of the art.

*Related Work.* Reachability analysis for systems described by ordinary differential equations, and their extension to hybrid systems, has been an active topic of research in the last decades. Outer-approximations have been dealt with ellipsoidal [20], sub-polyhedral techniques, such as zonotopes or support functions, and Taylor model based methods, for both linear and nonlinear systems [2,4–6,10,14,17,26]. A number of corresponding implementations exist [1,3,7,13,22,25,29]. Much less methods have been proposed, that answer the more difficult problem of inner-approximation. The existing approaches use ellipsoids [21] or non-linear approximations [8,16,19,31], but they are often computationally costly and imprecise. Recently, an interval-based method [24] was introduced for bracketing the positive invariant set of a system without relying on integration. However, it relies on space discretization and has only been applied successfully, as far as we know, to low dimensional systems.

Taylor methods for outer-approximating reachable sets of DDEs have been used only recently, in [28,32]. We will demonstrate that our approach improves the efficiency and accuracy over these interval-based Taylor methods.

The only previous work we know of for computing inner-approximations of solutions to DDEs, is the method of Xue et al. [30], extending the approach proposed for ODEs in [31]. Their method is based on a topological condition and a careful inspection of what happens at the boundary of the initial condition. We provide in the section dedicated to experiments a comparison to the few experimental results given in [30].

### **2 Preliminaries on Outer and Inner Approximations**

**Notations and Definitions.** Let us introduce some notations that we will use throughout the paper. Set valued quantities, scalar or vector valued, corresponding to uncertain inputs or parameters, are noted with bold letters, e.g *x*. When an approximation is introduced by computation, we add brackets: outerapproximating enclosures are noted in bold and enclosed within inward facing brackets, e.g. [*x*], and inner-approximations are noted in bold and enclosed within outward facing brackets, e.g. ]*x*[.

An outer-approximating extension of a function f : <sup>R</sup><sup>m</sup> <sup>→</sup> <sup>R</sup><sup>n</sup> is a function [*f*] : <sup>P</sup>(R<sup>m</sup>) → P(R<sup>n</sup>), such that for all *<sup>x</sup>* in <sup>P</sup>(R<sup>m</sup>), range(f, *<sup>x</sup>*) = {f(x), x <sup>∈</sup> *<sup>x</sup>*} ⊆ [*f*](*x*). Dually, inner-approximations determine a set of values proved to belong to the range of the function over some input set. An inner-approximating extension of f is a function ]*f*[: <sup>P</sup>(Rm) → P(Rn), such that for all *<sup>x</sup>* in <sup>P</sup>(Rm), ]*f*[(*x*) <sup>⊆</sup> range(f, *<sup>x</sup>*). Inner and outer approximations can be interpreted as quantified propositions: range(f, *<sup>x</sup>*) <sup>⊆</sup> [*z*] can be written (∀x <sup>∈</sup> *<sup>x</sup>*) (∃z <sup>∈</sup> [*z*]) (f(x) = z), while ]*z*[ <sup>⊆</sup> range(f, *<sup>x</sup>*) can be written (∀z <sup>∈</sup>]*z*[) (∃x <sup>∈</sup> *<sup>x</sup>*) (f(x) = z).

Let <sup>ϕ</sup>(t, β) for time <sup>t</sup> <sup>≥</sup> <sup>t</sup><sup>0</sup> denote the time trajectory of the dynamical system (1) for a parameter value β, and *<sup>z</sup>*(t, *<sup>β</sup>*) = {ϕ(t, β), β <sup>∈</sup> *<sup>β</sup>*} the set of states reachable at time t for the set of parameter values *<sup>β</sup>*. We extend the notion of outer and inner-approximations to the case where the function is the solution ϕ(t, β) of system (1) over the set *<sup>β</sup>*. An outer-approximating flowpipe is given by an outer-approximation of the set of reachable states, for all t in a time interval:

**Definition 1 (Outer-approximation).** *Given a vector of uncertain (constant) parameters or inputs* β <sup>∈</sup> *<sup>β</sup>, an outer-approximation at time* t *of the reachable set of states, is* [*z*](t, *<sup>β</sup>*) <sup>⊇</sup> *<sup>z</sup>*(t, *<sup>β</sup>*)*, such that* (∀β <sup>∈</sup> *<sup>β</sup>*) (∃z <sup>∈</sup> [*z*](t, *<sup>β</sup>*)) (ϕ(t, β) = z)*.*

**Definition 2 (Inner-approximation).** *Given a vector of uncertain (constant) parameters or inputs* β <sup>∈</sup> *<sup>β</sup>, an inner-approximation at time* t *of the reachable set, is* ]*z*[(t, *<sup>β</sup>*) <sup>⊆</sup> *<sup>z</sup>*(t, *<sup>β</sup>*) *such that* (∀<sup>z</sup> <sup>∈</sup>]*z*[(t, *<sup>β</sup>*)) (∃<sup>β</sup> <sup>∈</sup> *<sup>β</sup>*) (ϕ(t, β) = <sup>z</sup>)*.*

In words, any point of the inner flowpipe is the solution at time t of system (1), for some value of β <sup>∈</sup> *<sup>β</sup>*. If the outer and inner approximations are computed accurately, they approximate with arbitrary precision the exact reachable set.

Our method will also solve the more general robust inner-approximation problem of finding an inner-approximation of the reachable set, robust to uncertainty on an uncontrollable subset <sup>β</sup><sup>A</sup> of the vector of parameters <sup>β</sup>:

**Definition 3 (Robust inner-approximation).** *Given a vector of uncertain (constant) parameters or inputs* <sup>β</sup> = (βA, β<sup>E</sup> ) <sup>∈</sup> *<sup>β</sup>, an inner-approximation of the reachable set <sup>z</sup>*(t, *<sup>β</sup>*) *at time* <sup>t</sup>*, robust with respect to* <sup>β</sup><sup>A</sup>*, is a set* ]*z*[A(t, *<sup>β</sup>*<sup>A</sup>, *<sup>β</sup>*<sup>E</sup> ) *such that* (∀<sup>z</sup> <sup>∈</sup>]*z*[A(t, *<sup>β</sup>*<sup>A</sup>, *<sup>β</sup>*<sup>E</sup> )) (∀β<sup>A</sup> <sup>∈</sup> *<sup>β</sup>*A) (∃β<sup>E</sup> <sup>∈</sup> *<sup>β</sup>*<sup>E</sup> ) (ϕ(t, β<sup>A</sup>, β<sup>E</sup> ) = <sup>z</sup>)*.*

**Outer and Inner Interval Approximations.** Classical intervals are used in many situations to rigorously compute with interval domains instead of reals, usually leading to outer-approximations of function ranges over boxes. We denote the set of classical intervals by IR <sup>=</sup> {[x, <sup>x</sup>], x <sup>∈</sup> <sup>R</sup>, <sup>x</sup> <sup>∈</sup> <sup>R</sup>, x <sup>≤</sup> <sup>x</sup>}. Intervals are non-relational abstractions, in the sense that they rigorously approximate independently each component of a vector function f. We thus consider in this section a function f : <sup>R</sup><sup>m</sup> <sup>→</sup> <sup>R</sup>. The natural interval extension consists in replacing real operations by their interval counterparts in the expression of the function. A generally more accurate extension relies on a linearization by the mean-value theorem. Suppose f is differentiable over the interval *<sup>x</sup>*. Then, the mean-value theorem implies that (∀x<sup>0</sup> <sup>∈</sup> *<sup>x</sup>*) (∀<sup>x</sup> <sup>∈</sup> *<sup>x</sup>*) (∃<sup>c</sup> <sup>∈</sup> *<sup>x</sup>*) (f(x) = <sup>f</sup>(x<sup>0</sup>)+f (c)(x−x<sup>0</sup>)). If we can bound the range of the gradient of f over *<sup>x</sup>*, by [*f* ](*x*), then we can derive the following interval enclosure, usually called the mean-value extension: for any <sup>x</sup><sup>0</sup> <sup>∈</sup> *<sup>x</sup>*, range(f, *<sup>x</sup>*) <sup>⊆</sup> <sup>f</sup>(x<sup>0</sup>)+[*f* ](*x*)(*<sup>x</sup>* <sup>−</sup> x<sup>0</sup>).

*Example 2.* Consider f(x) = x<sup>2</sup>−x, its range over x = [2, 3] is [2, 6]. The natural interval extension of f, evaluated on [2, 3], is [*f*]([2, 3]) = [2, 3]<sup>2</sup> <sup>−</sup> [2, 3] = [1, 7]. The mean-value extension gives f(2.5)+ [*f* ]([2, 3])([2, 3] <sup>−</sup> <sup>2</sup>.5) = [1.25, <sup>6</sup>.25], using <sup>x</sup><sup>0</sup> = 2.5 and [*f* ](*x*)=2*x* − 1.

**Modal Intervals and Kaucher Arithmetic.** The results introduced in this section are mostly based on the work of Goldsztejn *et al.* [15] on modal intervals. Let us first introduce generalized intervals, i.e., intervals whose bounds are not ordered, and the Kaucher arithmetic [18] on these intervals.

The set of generalized intervals is denoted by IK <sup>=</sup> {*<sup>x</sup>* = [x, x], x <sup>∈</sup> <sup>R</sup>, x <sup>∈</sup> <sup>R</sup>}. Given two real numbers <sup>x</sup> and x, with x <sup>≤</sup> x, one can consider two generalized intervals, [x, x], which is called *proper*, and [x, x], which is called *improper*. We define dual([a, b]) = [b, a] and pro ([a, b]) = [min(a, b), max(a, b)].

**Definition 4 (**[15]**).** *Let* <sup>f</sup> : <sup>R</sup><sup>m</sup> <sup>→</sup> <sup>R</sup> *be a continuous function and <sup>x</sup>* <sup>∈</sup> IK<sup>m</sup>*, which we can decompose in <sup>x</sup>*<sup>A</sup> <sup>∈</sup> IR<sup>p</sup> *and <sup>x</sup>*<sup>E</sup> <sup>∈</sup> (*dual* IR)<sup>q</sup> *with* <sup>p</sup> <sup>+</sup> <sup>q</sup> <sup>=</sup> <sup>m</sup>*. A generalized interval <sup>z</sup>* <sup>∈</sup> IK *is* (f, *<sup>x</sup>*)*-interpretable if*

$$(\forall x\_{\mathcal{A}} \in \mathfrak{x}\_{\mathcal{A}}) \left( Q\_z z \in \operatorname{pro} \mathbf{z} \right) \left( \exists x\_{\mathcal{E}} \in \operatorname{pro} \mathfrak{x}\_{\mathcal{E}} \right) \left( f(x) = z \right) \tag{2}$$

*where* <sup>Q</sup><sup>z</sup> <sup>=</sup> <sup>∃</sup> *if* (*z*) *is proper, and* <sup>Q</sup><sup>z</sup> <sup>=</sup> <sup>∀</sup> *otherwise.*

When all intervals in (2) are proper, we retrieve the interpretation of classical interval computation, which gives an outer-approximation of range(f, *<sup>x</sup>*), or (∀x <sup>∈</sup> *<sup>x</sup>*) (∃z <sup>∈</sup> [*z*]) (f(x) = z). When all intervals are improper, (2) yields an inner-approximation of range(f, *<sup>x</sup>*), or (∀z <sup>∈</sup>]pro *<sup>z</sup>*[) (∃x <sup>∈</sup> pro *<sup>x</sup>*) (f(x) = z).

Kaucher arithmetic [18] provides a computation on generalized intervals that returns intervals that are interpretable as inner-approximations in some simple cases. Kaucher addition extends addition on classical intervals by *<sup>x</sup>* <sup>+</sup> *<sup>y</sup>* = [x <sup>+</sup> y, x <sup>+</sup> y] and *<sup>x</sup>* <sup>−</sup> *<sup>y</sup>* = [x <sup>−</sup> y, x <sup>−</sup> y]. For multiplication, let us decompose IK in <sup>P</sup> <sup>=</sup> {*<sup>x</sup>* = [x, <sup>x</sup>], x - <sup>0</sup> <sup>∧</sup> x - <sup>0</sup>}, −P <sup>=</sup> {*<sup>x</sup>* = [x, x], x <sup>0</sup> <sup>∧</sup> x <sup>0</sup>}, <sup>Z</sup> <sup>=</sup> {*<sup>x</sup>* <sup>=</sup> [x, x], x <sup>0</sup> x}, and dual <sup>Z</sup> <sup>=</sup> {*<sup>x</sup>* = [x, x], x - 0 x}. When restricted to proper intervals, the Kaucher multiplication coincides with the classical interval multiplication. Kaucher multiplication *xy* extends the classical multiplication to all possible combinations of *x* and *y* belonging to these sets. We refer to [18] for more details.

Kaucher arithmetic defines a generalized interval natural extension (see [15]):

**Proposition 1.** *Let* f : <sup>R</sup><sup>m</sup> <sup>→</sup> <sup>R</sup> *be a function, given by an arithmetic expression where each variable appears syntactically only once (and with degree 1). Then for <sup>x</sup>* <sup>∈</sup> IK<sup>m</sup>*,* <sup>f</sup>(*x*)*, computed using Kaucher arithmetic, is* (f, *<sup>x</sup>*) *interpretable.*

In some cases, Kaucher arithmetic can thus be used to compute an innerapproximation of range(f, *<sup>x</sup>*). But the restriction to functions f with single occurrences of variables, that is with no dependency, prevents a wide use. A generalized interval mean-value extension allows us to overcome this limitation: **Theorem 1.** *Let* f : <sup>R</sup><sup>m</sup> <sup>→</sup> <sup>R</sup> *be differentiable, and <sup>x</sup>* <sup>∈</sup> IK<sup>m</sup> *which we can decompose in <sup>x</sup>*<sup>A</sup> <sup>∈</sup> IR<sup>p</sup> *and <sup>x</sup>*<sup>E</sup> <sup>∈</sup> (*dual* IR)<sup>q</sup> *with* <sup>p</sup> <sup>+</sup> <sup>q</sup> <sup>=</sup> <sup>m</sup>*. Suppose that for each* i ∈ {1,...,m}*, we can compute* [*Δ*i] <sup>∈</sup> IR *such that*

$$\left\{\frac{\partial f}{\partial x\_i}(x), \ x \in pro\ x\right\} \subseteq [\Delta\_i]. \tag{3}$$

*Then, for any* x˜ <sup>∈</sup> *pro <sup>x</sup>, the following interval, evaluated with Kaucher arithmetic, is* (f, *<sup>x</sup>*)*-interpretable:*

$$\bar{f}(x) = f(\bar{x}) + \sum\_{i=1}^{n} [\Delta\_i](x\_i - \bar{x}\_i). \tag{4}$$

When using (4) for inner-approximation, we can only get the following subset of all possible cases in the Kaucher multiplication table: (*x* ∈ P) × (*y* ∈ dual <sup>Z</sup>)=[xy, xy], (*<sup>x</sup>* ∈ −P) <sup>×</sup> (*<sup>y</sup>* <sup>∈</sup> dual <sup>Z</sup>)=[xy, xy], and (*<sup>x</sup>* ∈ Z) <sup>×</sup> (*<sup>y</sup>* <sup>∈</sup> dual <sup>Z</sup>) = 0. Indeed, for an improper *<sup>x</sup>*, and ˜x <sup>∈</sup> pro *<sup>x</sup>*, it holds that (*<sup>x</sup>* <sup>−</sup> x˜) is in dual Z. The outer-approximation [*Δ*i] of the Jacobian is a proper interval, thus in P, −P or Z, and we can deduce from the multiplication rules that the inner-approximation is non empty only if [*Δ*i] does not contain 0.

*Example 3.* Let <sup>f</sup> be defined by <sup>f</sup>(x) = <sup>x</sup><sup>2</sup> <sup>−</sup> x, for which we want to compute an inner-approximation of the range over *<sup>x</sup>* = [2, 3]. Due to the two occurrences of x, f(dual *<sup>x</sup>*), computed with Kaucher arithmetic, is not (f, *<sup>x</sup>*)-interpretable. The interval ˜f(*x*) = f(2.5) + *<sup>f</sup>* ([2, 3])(*<sup>x</sup>* <sup>−</sup> <sup>2</sup>.5) = 3.75 + [3, 5](*<sup>x</sup>* <sup>−</sup> <sup>2</sup>.5) given by its mean-value extension, computed with Kaucher arithmetic, is (f, *<sup>x</sup>*) interpretable. For *<sup>x</sup>* = [2, 3], using the multiplication rule for P ×dual <sup>Z</sup>, we get ˜f(*x*)=3.75 + [3, 5]([2, 3] <sup>−</sup> <sup>2</sup>.5) = 3.75 + [3, 5][0.5, <sup>−</sup>0.5] = 3.75 + [1.5, <sup>−</sup>1.5] = [5.25, <sup>2</sup>.25], that can be interpreted as: (∀z <sup>∈</sup> [2.25, <sup>5</sup>.25]) (∃x <sup>∈</sup> [2, 3]) (z <sup>=</sup> f(x)). Thus, [2.25, <sup>5</sup>.25] is an inner-approximation of range(f, [2, 3]).

In Sect. 4, we will use Theorem <sup>1</sup> with f being each component (for a ndimensional system) of the solution of the uncertain dynamical system (1): we need an outer enclosure of the solution of the system, and of its Jacobian with respect to the uncertain parameters. This is the objective of the next sections.

### **3 Taylor Method for Outer Flowpipes of DDEs**

We now introduce a Taylor method to compute outer enclosures of the solution of system (1). The principle is to extend a Taylor method for the solution of ODEs to the case of DDEs, in a similar spirit to the existing work [28,32]. This can be done by building a Taylor model version of the method of steps [27], a technique for solving DDEs that reduces these to a sequence of ODEs.

#### **3.1 The Method of Steps for Solving DDEs**

The principle of the method of steps is that on each time interval [t<sup>0</sup>+iτ, t<sup>0</sup>+(i<sup>+</sup> 1)τ ], for i <sup>≥</sup> 1, the function z(t−τ ) is a known history function, already computed as the solution of the DDE on the previous time interval [t<sup>0</sup> + (<sup>i</sup> <sup>−</sup> 1)τ, t<sup>0</sup> <sup>+</sup> iτ ]. Plugging the solution of the previous ODE into the DDE yields a new ODE on the next tile interval: we thus have an initial value problem for an ODE with <sup>z</sup>(t<sup>0</sup> <sup>+</sup> iτ ) defined by the previous ODE. This process is initialized with <sup>z</sup><sup>0</sup>(t) on the first time interval [t<sup>0</sup>, t<sup>0</sup> <sup>+</sup> <sup>τ</sup> ]. The solution of the DDE can thus be obtained by solving a sequence of IVPs for ODEs. Generally, there is a discontinuity in the first derivative of the solution at <sup>t</sup><sup>0</sup> <sup>+</sup> <sup>τ</sup> . If this is the case, then because of the term <sup>z</sup>(<sup>t</sup> <sup>−</sup> <sup>τ</sup> ) in the DDE, a discontinuity will also appear at each <sup>t</sup><sup>0</sup> <sup>+</sup> iτ .

*Example 4.* Consider the DDE defined in Example 1. On t <sup>∈</sup> [0, τ ] the solution of the DDE is solution of the ODE

$$\dot{x}(t) = f(x(t), x\_0(t-\tau, \beta)) = -x(t)(1 + \beta(t-\tau))^2, \ t \in [0, \tau].$$

with initial value <sup>x</sup>(0) = <sup>x</sup><sup>0</sup>(0, β) = 1. It admits the analytical solution

$$x(t) = \exp\left(-\frac{1}{3\beta}\left(\left(1 + (t-1)\beta\right)^3 - \left(1 - \beta\right)^3\right)\right), \ t \in [0, \tau] \tag{5}$$

The solution of the DDE on the time interval [τ, <sup>2</sup>τ ] is the solution of the ODE

$$\dot{x}(t) = -x(t)\exp\left(-\frac{1}{3\beta}\left(\left(1 + (t-\tau-1)\beta\right)^3 - (1-\beta)^3\right)\right), \ t \in [\tau, 2\tau].$$
 
$$\dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots$$

with initial value x(τ ) given by (5). An analytical solution can be computed, using the transcendantal lower γ function.

#### **3.2 Finite Representation of Functions as Taylor Models**

A sufficiently smooth function g (e.g. C<sup>∞</sup>), can be represented on a time interval [t<sup>0</sup>, t<sup>0</sup> <sup>+</sup> <sup>h</sup>] by a Taylor expansion

$$g(t) = \sum\_{i=0}^{k} (t - t\_0)^i g^{[i]}(t\_0) + (t - t\_0)^{k+1} g^{[k+1]}(\xi),\tag{6}$$

with <sup>ξ</sup> <sup>∈</sup> [t<sup>0</sup>, t<sup>0</sup> <sup>+</sup> <sup>h</sup>], and using the notation <sup>g</sup>[i] (t) := <sup>g</sup>(*i*)(t) <sup>i</sup>! . We will use such Taylor expansions to represent the solution z(t) of the DDE on each time interval [t<sup>0</sup> <sup>+</sup> iτ, t<sup>0</sup> + (<sup>i</sup> + 1)<sup>τ</sup> ], starting with the initial condition <sup>z</sup><sup>0</sup>(t, β) on [t<sup>0</sup>, t<sup>0</sup> <sup>+</sup> <sup>τ</sup> ]. For more accuracy, we actually define these expansions piecewise on a finer time grid of fixed time step <sup>h</sup>. The function <sup>z</sup><sup>0</sup>(t, β) on time interval [t<sup>0</sup>, t<sup>0</sup> <sup>+</sup><sup>τ</sup> ] is thus represented by p <sup>=</sup> τ /h Taylor expansions. The l th such Taylor expansion, valid on the time interval [t<sup>0</sup> <sup>+</sup> lh, t<sup>0</sup> + (<sup>l</sup> + 1)h] with <sup>l</sup> ∈ {0,...,p <sup>−</sup> <sup>1</sup>}, is

$$z\_0(t, \beta) = \sum\_{i=0}^{k} (t - t\_0)^i z^{[i]} (t\_0 + lh, \beta) + (t - t\_0)^{k+1} z^{[k+1]} (\xi\_l, \beta), \tag{7}$$

for a <sup>ξ</sup><sup>l</sup> <sup>∈</sup> [t<sup>0</sup> <sup>+</sup> lh, t<sup>0</sup> + (<sup>l</sup> + 1)h].

### **3.3 An Abstract Taylor Model Representation**

In a rigorous version of the expansion (7), the z[i] (t<sup>0</sup>+lh, β) as well as g[k+1](ξl, β) are set-valued, as the vector of parameters β is set valued. The simplest way to account for these uncertainties is to use intervals. However, this approach suffers heavily from the wrapping effect, as these uncertainties accumulate with integration time. A more accurate alternative is to use a Taylor form in the parameters β for each z[i] (t<sup>0</sup> <sup>+</sup> lh, β). This is however very costly. We choose in this work to use a sub-polyhedric abstraction to parameterize Taylor coefficients, expressing some sensitivity of the model to the uncertain parameters: we rely on affine forms [9]. The result can be seen as Taylor models of arbitrary order in time, and order close to 1 in the parameters space.

The vector of uncertain parameters or inputs β <sup>∈</sup> *<sup>β</sup>* is thus defined as a vector of affine forms over <sup>m</sup> symbolic variables <sup>ε</sup><sup>i</sup> <sup>∈</sup> [−1, 1]: *<sup>β</sup>* <sup>=</sup> <sup>α</sup><sup>0</sup> <sup>+</sup> <sup>m</sup>*<sup>j</sup>* <sup>i</sup>=1 <sup>α</sup><sup>i</sup>ε<sup>i</sup>, where the coefficients <sup>α</sup><sup>i</sup> are vectors of real numbers. This abstraction describes the set of values of the parameters as given within a zonotope. In the sequel, we will use for zonotopes the same bold letter notation as for intervals, that account for set valued quantities.

*Example 5.* In Example 1, *β* = [ <sup>1</sup> <sup>3</sup> , 1] can be represented by the centered form *β* = <sup>2</sup> <sup>3</sup> <sup>+</sup> <sup>1</sup> <sup>3</sup> <sup>ε</sup><sup>1</sup>. The set of initial conditions *<sup>x</sup>*0(t, *<sup>β</sup>*) is abstracted as a function of the noise symbol ε<sup>1</sup>. For example, at <sup>t</sup> <sup>=</sup> <sup>−</sup>1, *<sup>x</sup>*0(−1, *<sup>β</sup>*) = (1 <sup>−</sup> *<sup>β</sup>*)<sup>2</sup> <sup>=</sup> (1 <sup>−</sup> <sup>2</sup> <sup>3</sup> <sup>−</sup> <sup>1</sup> <sup>3</sup> <sup>ε</sup><sup>1</sup>)<sup>2</sup> <sup>=</sup> <sup>1</sup> <sup>9</sup> (1 <sup>−</sup> <sup>ε</sup><sup>1</sup>)<sup>2</sup>. The abstraction of affine arithmetic operators is computed componentwise on the noise symbols ε<sup>i</sup>, and does not introduce any over-approximation. The abstraction of non affine operations is conservative: an affine approximation of the result is computed, and a new noise term is added, that accounts for the approximation error. Here, using ε<sup>2</sup> <sup>1</sup> <sup>∈</sup> [0, 1], affine arithmetic [9] will yield [*x*0](−1, *<sup>β</sup>*) = <sup>1</sup> <sup>9</sup> (1 <sup>−</sup> <sup>2</sup>ε<sup>1</sup> + [0, 1]) = <sup>1</sup> <sup>9</sup> (1.<sup>5</sup> <sup>−</sup> <sup>2</sup>ε<sup>1</sup> + 0.5ε<sup>2</sup>), with <sup>ε</sup><sup>2</sup> <sup>∈</sup> [−1, 1]. We are now using notation [*x*0], denoting an outer-approximation. Indeed, the abstraction is conservative: [*x*0](−1, *<sup>β</sup>*) takes its values in <sup>1</sup> <sup>9</sup> [−1, 4], while the exact range of *<sup>x</sup>*0(−1, β) for <sup>β</sup> <sup>∈</sup> [ <sup>1</sup> <sup>3</sup> , 1] is <sup>1</sup> <sup>9</sup> [0, 4].

Now, we can represent the initial solution for <sup>t</sup> <sup>∈</sup> [t<sup>0</sup>, t<sup>0</sup> <sup>+</sup> <sup>τ</sup> ] of the DDE (1) as a Taylor model in time with zonotopic coefficients, by evaluating in affine arithmetic the coefficients of its Taylor model (7). Noting *<sup>r</sup>*0<sup>j</sup> = [t<sup>0</sup> <sup>+</sup> jh, t<sup>0</sup> <sup>+</sup> (j + 1)h], we write, for all j = 0,...,p <sup>−</sup> 1,

$$[\mathbf{z}](t) = \sum\_{l=0}^{k-1} (t - t\_0)^l [\mathbf{z}\_{0j}]^{[l]} + (t - t\_0)^k [\overline{\mathbf{z}}\_{0j}]^{[k]}, \ t \in \mathbf{r}\_{0j} \tag{8}$$

where the Taylor coefficients

$$[\mathbf{z}\_{0j}]^{[l]} := \frac{[\mathbf{z}\_0]^{(l)}(t\_0 + jh, \mathcal{B})}{l!}, \ [\mathbf{z}\_{0j}]^{[l]} := \frac{[\mathbf{z}\_0]^{(l)}(r\_{0j}, \mathcal{B})}{l!} \tag{9}$$

can be computed by differentiating the initial solution with respect to <sup>t</sup> ([*z*0] (l) denotes the l-th time derivative), and evaluating the result in affine arithmetic. *Example 6.* Suppose we want to build a Taylor model of order k = 2 for the initial condition in Example <sup>1</sup> on a grid of step size h = 1/3. Consider the Taylor model for the first step [t<sup>0</sup>, t<sup>0</sup> <sup>+</sup> <sup>h</sup>]=[−1, <sup>−</sup>2/3]: we need to evaluate [*x*00] [0] = [*x*0](−1, *<sup>β</sup>*), which was done Example 5.

We also need [*x*00] [1] and [*x*00] [2]. We compute [*x*00] [1] =[˙x<sup>0</sup>](−1, *<sup>β</sup>*)=2*β*(1<sup>−</sup> *β*) and [*x*00] [2] = [*x*0] (2)(*r*l)/2=[*x*¨0](*r*l)/2 = *<sup>β</sup>*<sup>2</sup>, with *<sup>β</sup>* <sup>=</sup> <sup>2</sup> <sup>3</sup> <sup>+</sup> <sup>1</sup> <sup>3</sup> <sup>ε</sup><sup>1</sup>. We evaluate these coefficients with affine arithmetic, similarly to Example 5.

### **3.4 Constructing Flowpipes**

The abstract Taylor models (8) introduced in Sect. 3.3, define piecewise outerapproximating flowpipes of the solution on [t0, t<sup>0</sup>+τ ]. Using the method of steps, and plugging into (1) the solution computed on [t<sup>0</sup>+(i−1)τ, t<sup>0</sup>+iτ ], the solution of (1) can be computed by solving the sequence of ODEs

$$\dot{z}(t) = f(z(t), z(t-\tau), \beta), \text{ for } t \in [t\_0 + i\tau, t\_0 + (i+1)\tau] \tag{10}$$

where the initial condition <sup>z</sup>(t<sup>0</sup> <sup>+</sup> iτ ), and <sup>z</sup>(<sup>t</sup> <sup>−</sup> <sup>τ</sup> ) for <sup>t</sup> in [t<sup>0</sup> <sup>+</sup> iτ, t<sup>0</sup> + (<sup>i</sup> + 1)<sup>τ</sup> ], are fully defined by (8) when i = 1, and by the solution of (10) at previous step when i is greater than 1.

Let the set of the solutions of (10) at time t and for the initial conditions z(t ) <sup>∈</sup> *<sup>z</sup>* at some initial time t <sup>≥</sup> <sup>t</sup><sup>0</sup> be denoted by *<sup>z</sup>*(t, t , *z* ). Using a Taylor method for ODEs, we can compute flowpipes that are guaranteed to contain the reachable set of the solutions *<sup>z</sup>*(t, t<sup>0</sup> <sup>+</sup> τ, [*z*](t<sup>0</sup> <sup>+</sup> <sup>τ</sup> )) of (10), for all times <sup>t</sup> in [t<sup>0</sup> <sup>+</sup> τ, t<sup>0</sup> + 2<sup>τ</sup> ], with [*z*](t<sup>0</sup> <sup>+</sup> <sup>τ</sup> ) given by the evaluation of the Taylor model (8). This can be iterated for further steps of length τ , solving (10) for i = 1, . . . , T /τ , with an initial condition given by the evaluation of the Taylor model for (10) at the previous step.

We now detail the algorithm that results from this principle. Flowpipes are built using two levels of grids. At each step on the coarser grid with step size τ , we define a new ODE. We build the Taylor models for the solution of this ODE on the finer grid of integration step size <sup>h</sup> <sup>=</sup> τ /p. We note <sup>t</sup><sup>i</sup> <sup>=</sup> <sup>t</sup><sup>0</sup> <sup>+</sup> iτ the points of the coarser grid, and <sup>t</sup>ij <sup>=</sup> <sup>t</sup><sup>0</sup> <sup>+</sup> iτ <sup>+</sup> jh the points of the finer grid. In order to compute the flowpipes in a piecewise manner on this grid, the Taylor method relies on Algorithm 1. All Taylor coefficients, as well as Taylor expansion evaluations, are computed in affine arithmetic.

**Step 1: Computing an a Priori Enclosure.** We need an a priori enclosure [*z*ij ] of the solution <sup>z</sup>(t), valid on the time interval [tij , t<sup>i</sup>(j+1)]. This is done by a straightforward extension of the classical approach [26] for ODEs relying on the interval Picard-Lindel¨of method, applied to Eq. (10) on [tij , t<sup>i</sup>(j+1)] with initial condition [*z*ij ]. If [*f*] is Lipschitz, the natural interval extension [*F*] of the Picard-Lindel¨of operator defined by [*F*](*z*)=[*z*ij ]+[tij , t<sup>i</sup>(j+1)][*f*](*z*, [*z*<sup>i</sup>(j−1)], *<sup>β</sup>*), where the enclosure of the solution over *<sup>r</sup>*<sup>i</sup>(j−1) = [t<sup>i</sup>(j−1), tij ] has already be computed as [*z*<sup>i</sup>(j−1)], admits a unique fixpoint. A simple Jacobi-like iteration, *z*<sup>0</sup> = [*z*ij ], *<sup>z</sup>*<sup>l</sup>+1 <sup>=</sup> *<sup>F</sup>*(*z*l) for all <sup>l</sup> <sup>∈</sup> <sup>N</sup>, suffices to reach the fixpoint of this iteration which Build by (9) the [*z*0*<sup>j</sup>* ] [*l*] , j ∈ {0,...,p − 1} that define the Taylor model on [t0, t<sup>0</sup> + τ ], and Initialize next flowpipe: [*z*10]=[*z*0](t10, β) at t<sup>10</sup> = t<sup>0</sup> + τ For all i = 0, . . . , T /τ do For all j = 0,...,p − 1 do Step 1: compute an a priori enclosure [z*ij* ] of z(t) valid on [t*ij* , t*i*(*j*+1)] Step 2: build by (12), (14), a Taylor model valid on [t*ij* , t*i*(*j*+1)] Using (11), initialize next flowpipe: [*zi*(*j*+1)]=[*z*](t*i*(*j*+1), t*ij* , [*zij* ]) if j<p − 1, [*z*(*i*+1)0]=[*z*](t(*i*+1)0, t*ij* , [*zij* ]) if j = p − 1

**Algorithm 1.** Sketch of the computation of outer reachable sets for a DDE

yields [*z*ij)], and ensures the existence and uniqueness of a solution to (10) on [tij , t<sup>i</sup>(j+1)]. However, it may be necessary to reduce the step size.

**Step 2: Building the Taylor Model.** A Taylor expansion of order k of the solution at <sup>t</sup>ij which is valid on the time interval [tij , t<sup>i</sup>(j+1)], for <sup>i</sup> <sup>≥</sup> 1, is

$$[z](t, t\_{ij}, [z\_{ij}]) = [z\_{ij}] + \sum\_{l=1}^{k-1} (t - t\_{ij})^l [f\_{ij}]^{[l]} + (t - t\_{ij})^k [\overline{f\_{ij}}]^{[k]},\tag{11}$$

The Taylor coefficients are defined inductively, and can be computed by automatic differentiation, as follows:

$$\begin{bmatrix} \mathbf{f}\_{ij} \end{bmatrix}^{[1]} = [\mathbf{f}] \begin{pmatrix} [\mathbf{z}\_{ij}], [\mathbf{z}\_{(i-1)j}], \boldsymbol{\beta} \end{pmatrix} \tag{12}$$

$$\left[\boldsymbol{f}\_{1j}\right]^{[l+1]} = \frac{1}{l+1} \left( \left[\frac{\partial \boldsymbol{f}^{[l]}}{\partial z}\right] \left[\boldsymbol{f}\_{1j}\right]^{[1]} + \left[z\_{0j}\right] \left[\boldsymbol{f}\_{0j}\right]^{[1]} \right) \tag{13}$$

$$\left[\boldsymbol{f}\_{ij}\right]^{[l+1]} = \frac{1}{l+1} \left( \left[\frac{\partial \boldsymbol{f}^{[l]}}{\partial z}\right] \left[\boldsymbol{f}\_{ij}\right]^{[1]} + \left[\frac{\partial \boldsymbol{f}^{[l]}}{\partial z^{\tau}}\right] \left[\boldsymbol{f}\_{(i-1)j}\right]^{[1]} \right) \quad \text{if } i \ge 2 \tag{14}$$

The Taylor coefficients for the remainder term are computed in a similar way, evaluating [*f*] over the a priori enclosure of the solution on *<sup>r</sup>*ij = [tij , t<sup>i</sup>(j+1)]. For instance, [*f*ij ] [1] = [*f*]([*z*ij ], [*z*(i−1)<sup>j</sup> ]). The derivatives can be discontinuous at <sup>t</sup><sup>i</sup>0: the [*f*<sup>i</sup>0] [l] coefficients correspond to the right-handed limit, at time t + i0.

Let us detail the computation of the coefficients (12), (13) and (14). Let z(t) be the solution of (10). By definition, dz dt (t) = <sup>f</sup>(z(t), z(t−<sup>τ</sup> ), β) = <sup>f</sup>[1](z(t), z(t<sup>−</sup> τ ), β) from which we deduce the set valued version (12). We can prove (14) by induction on l. Let us denote ∂z the partial derivative with respect to z(t), and ∂z<sup>τ</sup> with respect to the delayed function z(t <sup>−</sup> τ ). We have

$$\begin{array}{c} f^{[l+1]}(z(t), z(t-\tau), \beta) = \frac{1}{(l+1)!} \frac{d^{(l+1)}z}{dt^{(l+1)}}(t) = \frac{1}{l+1} \frac{d}{dt} \left( f^{[l]}(z(t), z(t-\tau), \beta) \right) \\ = \frac{1}{l+1} \left( \dot{z}(t) \frac{\partial f^{[l]}}{\partial z} + \dot{z}(t-\tau) \frac{\partial f^{[l]}}{\partial z^{\tau}} \right) \\ = \frac{1}{l+1} \left( f(z(t), z(t-\tau), \beta) \frac{\partial f^{[l]}}{\partial z} + \\ f(z(t-\tau), z(t-2\tau), \beta) \frac{\partial f^{[l]}}{\partial z^{\tau}} \right) \end{array}$$

from which we deduce the set valued version (14). For <sup>t</sup> <sup>∈</sup> [t<sup>0</sup> <sup>+</sup> τ, t<sup>0</sup> + 2<sup>τ</sup> ], the only difference is that ˙z(t <sup>−</sup> τ ) is obtained by differentiating the initial solution of the DDE on [t<sup>0</sup>, t<sup>0</sup> <sup>+</sup> <sup>τ</sup> ], which yields (13).

*Example 7.* As in Example 6, we build the first step of the Taylor model of order <sup>k</sup> = 2 on the system of Example 1. We consider <sup>t</sup> <sup>∈</sup> [t<sup>0</sup> <sup>+</sup> τ, t<sup>0</sup> + 2<sup>τ</sup> ], on a grid of step size <sup>h</sup> = 1/3. Let us build the Taylor model on [t<sup>0</sup> <sup>+</sup> τ, t<sup>0</sup> <sup>+</sup> <sup>τ</sup> <sup>+</sup> <sup>h</sup>] = [0, <sup>1</sup>/3]: we need to evaluate[*x*10], [*f* <sup>10</sup>] [1] and [*f* <sup>10</sup>] [2] in affine arithmetic.

Following Algorithm 1, [*x*10]=[*x*0](t10, *<sup>β</sup>*)=[*x*0](t<sup>0</sup> <sup>+</sup>τ, *<sup>β</sup>*)=[*x*0](0, *<sup>β</sup>*) = 1. Using (12) and the computation of [*x*00] of Example 5, [*f* <sup>10</sup>] [1] = [*f*]([*x*10], [*x*00]) = [*f*](1, <sup>1</sup> <sup>9</sup> (1.<sup>5</sup> <sup>−</sup> <sup>2</sup>ε<sup>1</sup> + 0.5ε<sup>2</sup>)) = <sup>−</sup><sup>1</sup> <sup>9</sup> (1.<sup>5</sup> <sup>−</sup> <sup>2</sup>ε<sup>1</sup> + 0.5ε<sup>2</sup>). Finally, using (13), [*f* <sup>10</sup>] [2] = 0.<sup>5</sup> ˙f(*r*<sup>10</sup>, *<sup>r</sup>*00), where *<sup>r</sup>*<sup>i</sup><sup>0</sup> for <sup>i</sup> = 0, 1 (with <sup>r</sup><sup>00</sup> <sup>=</sup> <sup>r</sup><sup>10</sup> <sup>−</sup> <sup>τ</sup> ) is the time interval of width h equal to [t<sup>i</sup>0, t<sup>i</sup>1]=[−1+i, <sup>−</sup>1+i+ 1/3], and ˙f(t, t<sup>−</sup> τ )= ˙x(t)x(t−τ ) +x(t) ˙x(t−τ ) = f(t, t−τ )x(t−τ ) +x(t) ˙x<sup>0</sup>(t−<sup>τ</sup> ) = <sup>−</sup>x(t)x(t<sup>−</sup> τ )<sup>2</sup>+2x(t)β(1+βt). Thus, [*<sup>f</sup>* <sup>10</sup>] [2] <sup>=</sup> <sup>−</sup>0.5[*x*(*r*10)][*x*(*r*00)]<sup>2</sup>+[*x*(*r*10)]*β*(1+*βr*10). We need enclosures for *<sup>x</sup>*(*r*00) and <sup>x</sup>(*r*10), to compute this expression. Enclosure [*x*(*r*00)] is directly obtained as [*x*0](*r*00) = (1+*βr*00)<sup>2</sup>, evaluated in affine arithmetic. Evaluating [*x*(*r*10)] requires to compute an a priori enclosure of the solution on interval *r*10, following the approach described as Step 1 in Algorithm 1. The Picard-Lindel¨of operator is [*F*](*x*)=[*x*10] + [0, <sup>1</sup> <sup>3</sup> ][*f*](*x*, [*x*(*r*00)], *<sup>β</sup>*) = 1 + [0, <sup>1</sup> <sup>3</sup> ](1 + *βr*00)<sup>2</sup>*x*. We evaluate it in interval rather than affine arithmetic for simplicity: [*F*](*x*) = 1 + [0, <sup>1</sup> 3 ] 1+[ <sup>1</sup> <sup>3</sup> , 1][−1, <sup>−</sup><sup>2</sup> 3 ] 2 *<sup>x</sup>* = 1 + [0, <sup>7</sup><sup>2</sup> <sup>3</sup><sup>5</sup> ]*x*. Starting with *<sup>x</sup>*<sup>0</sup> = [*x*10] = 1, we compute *<sup>x</sup>*<sup>1</sup> = [*F*](1) = [1, 1 + <sup>7</sup><sup>2</sup> <sup>3</sup><sup>5</sup> ], *<sup>x</sup>*<sup>2</sup> = [*F*](*x*1) = [1, 1 + <sup>7</sup><sup>2</sup> <sup>3</sup><sup>5</sup> + ( <sup>7</sup><sup>2</sup> <sup>3</sup><sup>5</sup> )<sup>2</sup>], etc. This is a geometric progression, that converges to a finite enclosure.

*Remark.* A fixed step size yields a simpler algorithm. However it is possible to use a variable step size, with an additional interpolation of the Taylor models.

### **4 Inner-Approximating Flowpipes**

We will now use Theorem 1 in order to compute inner-approximating flowpipes from outer-approximating flowpipes, extending the work [16] for ODEs to the case of DDEs. The main idea is to instantiate in this theorem the function f as the solution z(t, β) of our uncertain system (1) for all t, and *<sup>x</sup>* as the range *<sup>β</sup>* of the uncertain parameters. For this, we need to compute an outer-approximation of z(t, β˜) for some β˜ <sup>∈</sup> *<sup>β</sup>*, and of its Jacobian matrix with respect to β at any time t and over the range *<sup>β</sup>*. We follow the approach described in Sect. 3.4.

**Outer-Approximation of the Jacobian Matrix Coefficients.** For the DDE (1) in arbitrary dimension n <sup>∈</sup> <sup>N</sup> and with parameter dimension m <sup>∈</sup> <sup>N</sup>, the Jacobian matrix of the solution z = (z<sup>1</sup>,...,z<sup>n</sup>) of this system with respect to the parameters <sup>β</sup> = (β<sup>1</sup>,...,β<sup>m</sup>) is

$$J\_{ij}(t) = \frac{\partial z\_i}{\partial \beta\_j}(t)$$

for i between 1 and n, j between 1 and m. Differentiating (1), we obtain that the coefficients of the Jacobian matrix of the flow satisfy

$$\dot{J}\_{ij}(t) = \sum\_{k=1}^{p} \frac{\partial f\_i}{\partial z\_k}(t) J\_{kj}(t) + \sum\_{k=1}^{p} \frac{\partial f\_i}{\partial z\_k^{\tau}}(t) J\_{kj}(t - \tau) + \frac{\partial f\_i}{\partial \beta\_j}(t) \tag{15}$$

with initial condition <sup>J</sup>ij (t)=(Jij )0(t, β) = <sup>∂</sup>(z*i*)<sup>0</sup> ∂β*<sup>j</sup>* (t, β) for <sup>t</sup> <sup>∈</sup> [t<sup>0</sup>, t<sup>0</sup> <sup>+</sup> <sup>τ</sup> ].

*Example 8.* The Jacobian matrix for Example 1 is a scalar since the DDE is real-valued and the parameter is scalar. We easily get J˙ <sup>11</sup>(t) = <sup>−</sup>x(t−τ )J<sup>11</sup>(t)<sup>−</sup> x(t)J<sup>11</sup>(t <sup>−</sup> τ ) with initial condition (J<sup>11</sup>)0(t, β)=2t(1 + βt).

Equation (15) is a DDE of the same form as (1). We can thus use the method introduced in Sect. 3.4, and use Taylor models to compute outer-approximating flowpipes for the coefficients of the Jacobian matrix.

**Computing Inner-Approximating Flowpipes.** Similarly as for ODEs [16], the algorithm that computes inner-approximating flowpipes, first uses Algorithm 1 to compute outer-approximations, on each time interval [tij , t<sup>i</sup>(j+1)], of


Then, we can deduce inner-approximating flowpipes by using Theorem 1. Let as in Definition <sup>3</sup> <sup>β</sup> = (βA, β<sup>E</sup> ) and note <sup>J</sup><sup>A</sup> the matrix obtained by extracting the columns of the Jacobian corresponding to the partial derivatives with respect to <sup>β</sup><sup>A</sup>. Denote by <sup>J</sup><sup>E</sup> the remaining columns. If the quantity defined by Eq. (16) for <sup>t</sup> in [tij , t<sup>i</sup>(j+1)] is an improper interval

$$\begin{split} [\mathbf{z}] \mathbf{z} [\mathcal{A}(t, t\_{ij}, \mathcal{B}\_{\mathcal{A}}, \mathcal{B}\_{\mathcal{E}}) = [\mathbf{z}](t, t\_{ij}, [\mathbf{\bar{z}}\_{ij}]) + [\mathbf{J}]\_{\mathcal{A}}(t, t\_{ij}, [\mathbf{J}\_{ij}])(\mathcal{B}\_{\mathcal{A}} - \mathbf{\bar{\beta}}\_{\mathcal{A}}) \\ + [\mathbf{J}]\_{\mathcal{E}}(t, t\_{ij}, [\mathbf{J}\_{ij}])(\text{dual } \mathcal{B}\_{\mathcal{E}} - \mathbf{\bar{\beta}}\_{\mathcal{E}}) \end{split} \tag{16}$$

then the interval (pro ]*z*[A(t, tij , *<sup>β</sup>*<sup>A</sup>, *<sup>β</sup>*<sup>E</sup> )) is an inner-approximation of the reachable set *<sup>z</sup>*(t, *<sup>β</sup>*) valid on the time interval [tij , t<sup>i</sup>(j+1)], which is robust with respect to the parameters β<sup>A</sup>, in the sense of Definition 3. Otherwise the innerapproximation is empty. If all parameters are existentially quantified, that is if the subset <sup>β</sup><sup>A</sup> is empty, we obtain the classical inner-approximation of Definition 2. Note that a unique computation of the center solution [*z*˜] and the Jacobian matrix [*J*] can be used to infer different interpretations as inner-approximations or robust inner-approximations. With this computation, the robust inner flowpipes will always be included in the classical inner flowpipes.

The computation of the inner-approximations fully relies on the outerapproximations at each time step. A consequence is that we can soundly implement most of our approach using classical interval-based methods: outward rounding should be used for the outer approximations of flows and Jacobians. Only the final computation by Kaucher arithmetic of improper intervals should be done with inward rounding in order to get a sound computation of the innerapproximation.

Also, the wider the outer-approximation in Taylor models for the center and the Jacobian, the tighter and thus the less accurate is the inner-approximation. This can lead to an empty inner-approximation if the result of Eq. (16) in Kaucher arithmetic is not an improper interval. This can occur in two way. Firstly, the Kaucher multiplication [*J*]<sup>E</sup> (dual *<sup>β</sup>*<sup>E</sup> <sup>−</sup> <sup>β</sup>˜<sup>E</sup> ) in (16), yields a nonzero improper interval only if the Jacobian coefficients do not contain 0. Secondly, suppose that the Kaucher multiplication yields an improper interval. It is added to the proper interval [*z*](t, tij , [*z*˜ij ]) + [*J*]<sup>A</sup> <sup>∗</sup> (*β*<sup>A</sup> <sup>−</sup> <sup>β</sup>˜A). The center solution [*z*](t, tij , [*z*˜ij ]) can be tightly estimated, but the term [*J*]A(*β*<sup>A</sup> <sup>−</sup> <sup>β</sup>˜A) that measures robustness with respect to the <sup>β</sup><sup>A</sup> parameters can lead to a wide enclosure. If this sum is wider than the improper interval resulting from the Kaucher multiplication, then the resulting Kaucher sum will be proper and the inner-approximation empty.

### **5 Implementation and Experiments**

We have implemented our method using the FILIB++ C++ library [23] for interval computations, the FADBAD++<sup>2</sup> package for automatic differentiation, and (a slightly modified version of) the aaflib<sup>3</sup> library for affine arithmetic.

Let us first consider the running example, with order 2 Taylor models, and an integration step size of 0.05. Figure <sup>1</sup> left presents the results until t = 2 (obtained in 0.03 s) compared to the analytical solution (dashed lines): the solid external lines represent the outer-approximating flowpipe, the filled region represents the inner-approximating flowpipe. Until time t = 0, the DDE is in its initialization phase, and the conservativeness of the outer-approximation is due to the abstraction in affine arithmetic of the set of initialization functions. Using higher-order Taylor models, or refining the time step improves the accuracy. However, for the inner-approximation, there is a specific difficulty: the Jacobian contains 0 at t <sup>=</sup> <sup>−</sup>1, so that the inner-approximation is reduced to a point. This case corresponds to the parameter value β = 1. To address this problem, we split the initial parameter set in two sub-intervals of equal width, compute independently the inner and outer flowpipes for these two parameters ranges, and then join the results to obtain Fig. 1 center. It is somehow counter intuitive that we can get this way a larger, thus better quality, inner-approximating set, as the inner-approximation corresponds to the property that there exist a value of β in the parameter set such that a point of the tube is definitely reached. Taking a larger β parameter set would intuitively lead to a larger such inner tube. However, this is in particular due to the fact that we avoid here the zero in the

<sup>2</sup> http://www.fadbad.com.

<sup>3</sup> http://aaflib.sourceforge.net.

Jacobian. More generally, such a subdivision yields a tighter outer-approximation of the Jacobian, and thus better accuracy when using the mean-value theorem.

**Fig. 1.** Running example (Taylor model order 2, step size 0.05)

In order to obtain an inner-approximation without holes, we can use a subdivision of the parameters with some covering. This is the case for instance using 10 subdivisions, with 10% of covering. Results are now much tighter: Fig. 1 right represents a measure γ(x, t) of the quality of the approximations (computed in 45 s) for a time horizon T = 15, with Taylor Model of order 3, a step size of 0.02. This accuracy measure γ(x, t) is defined by γ(x, t) = <sup>γ</sup>*u*(x) <sup>γ</sup>*o*(x) where <sup>γ</sup><sup>u</sup>(x) and <sup>γ</sup><sup>o</sup>(x) measure respectively the width of the inner-approximation and outer-approximation, for state variable x. Intuitively, the larger the ratio (bounded by 1), the better the approximation. Here, γ(x, t) almost stabilizes after some time, to a high accuracy of 0.975. We noted that in this example, the order of the Taylor model, the step size and the number of initial subdivisions all have a notable impact on the stabilized value of γ, that can here be decreased arbitrarily.

*Example 9.* Consider a basic PD-controller for a self-driving car, controlling the car's position x and velocity v by adjusting its acceleration depending on the current distance to a reference position <sup>p</sup><sup>r</sup>, chosen here as <sup>p</sup><sup>r</sup> = 1. We consider a delay τ to transfer the input data to the controller, due to sensing, computation or transmission times. This leads, for t <sup>≥</sup> 0, to:

$$\begin{cases} x'(t) = v(t) \\ v'(t) = -K\_p \left( x(t-\tau) - p\_r \right) - K\_d v(t-\tau) \end{cases}$$

Choosing <sup>K</sup><sup>p</sup> = 2 and <sup>K</sup><sup>d</sup> = 3 guarantees the asymptotic stability of the controlled system when there is no delay. The system is initialized to a constant function (x, v) <sup>∈</sup> [−0.1, <sup>0</sup>.1] <sup>×</sup> [0, <sup>0</sup>.1] on the time interval [−τ, 0].

This example demonstrates that even small delays can have a huge impact on the dynamics. We represent in the left subplot of Fig. 2 the inner and outer

**Fig. 2.** Left and center: velocity and position of controlled car (left <sup>τ</sup> = 0.35, center τ = 0.2); Right: vehicles position in the platoon example

approximating flowpipes for the velocity and position, with delay τ = 0.35, until time T = 10. They are obtained in 0.32 s, using Taylor models of order 3 and a time step of 0.03. The parameters were chosen such that the inner-approximation always remains non-empty. We now study the robustness of the behavior of the system to the parameters: <sup>K</sup><sup>p</sup> and <sup>K</sup><sup>d</sup> are time invariant, but now uncertain and known to be bounded by (K<sup>p</sup>, K<sup>d</sup>) <sup>∈</sup> [1.95, <sup>2</sup>.05] <sup>×</sup> [2.95, <sup>3</sup>.05]. The Jacobian matrix is now of dimension 2 <sup>×</sup> 4. We choose a delay τ = 0.2, sufficiently small to not induce oscillations. Thanks to the outer-approximation, we prove that the velocity never becomes negative, in contrast to the case of τ = 0.35 where it is proved to oscillate. In Fig. 2 center, we represent, along with the overapproximation, the inner-approximation and a robust inner-approximation. The inner-approximation, in the sense of Definition 2, contains only states for which it is proved that there exists an initialization of the state variables x and v in [−0.1, <sup>0</sup>.1] <sup>×</sup> [0, <sup>0</sup>.1] and a value of <sup>K</sup><sup>p</sup> and <sup>K</sup><sup>d</sup> in [1.95, <sup>2</sup>.05] <sup>×</sup> [2.95, <sup>3</sup>.05], such that these states are solutions of the DDE. The inner-approximation which is robust with respect to the uncertainty in <sup>K</sup><sup>p</sup> and <sup>K</sup><sup>d</sup>, in the sense of Definition 3, contains only states for which it is proved that, whatever the values of <sup>K</sup><sup>p</sup> and <sup>K</sup><sup>d</sup> in [1.95, <sup>2</sup>.05] <sup>×</sup> [2.95, <sup>3</sup>.05], there exist an initialization of <sup>x</sup> and <sup>v</sup> in [−0.1, <sup>0</sup>.1]×[0, <sup>0</sup>.1], such that these states are solutions of the DDE. These results are obtained in 0.24 s, with order 3 Taylor models and a time step of 0.04. The robust inner-approximation is naturally included in the inner-approximation.

We now demonstrate the efficiency of our approach and its good scaling behavior with respect to the dimension of the state space, by comparing our results with the results of [30] on their seven-dimensional Example 3:

*Example 10.* Let ˙x(t) = f(x(t), x(t <sup>−</sup> τ )), t <sup>∈</sup> [τ = 0.01, T], where f(x(t), x(t <sup>−</sup> τ ) = (1.4x<sup>3</sup>(t) <sup>−</sup> <sup>0</sup>.9x<sup>1</sup>(<sup>t</sup> <sup>−</sup> <sup>τ</sup> ), <sup>2</sup>.5x<sup>5</sup>(t) <sup>−</sup> <sup>1</sup>.5x<sup>2</sup>(t), <sup>0</sup>.6x<sup>7</sup>(t) <sup>−</sup> <sup>0</sup>.8x<sup>3</sup>(t)x<sup>2</sup>(t), <sup>2</sup>−1.3x<sup>4</sup>(t)x<sup>3</sup>(t), <sup>0</sup>.7x<sup>1</sup>(t)−x<sup>4</sup>(t)x<sup>5</sup>(t), <sup>0</sup>.3x<sup>1</sup>(t)−3.1x<sup>6</sup>(t), <sup>1</sup>.8x<sup>6</sup>(t)<sup>−</sup> <sup>1</sup>.5x<sup>7</sup>(t)x<sup>2</sup>(t)), and the initial function is constant on [−τ, 0] with values in a box<sup>4</sup> [1.0, <sup>1</sup>.2] <sup>×</sup> [0.95, <sup>1</sup>.15] <sup>×</sup> [1.4, <sup>1</sup>.6] <sup>×</sup> [2.3, <sup>2</sup>.5] <sup>×</sup> [0.9, <sup>1</sup>.1] <sup>×</sup> [0.0, <sup>0</sup>.2] <sup>×</sup> [0.35, <sup>0</sup>.55]. We compute outer and inner approximations of the reachable sets of the DDE until time t = 0.1, and compare the quality measure

<sup>4</sup> The first component is different from that given in [30], but is the correct initial condition, after discussion with the authors.

γ(x<sup>1</sup>),...,γ(x<sup>7</sup>) for the projection of the approximations over each variable <sup>x</sup><sup>1</sup> to <sup>x</sup><sup>7</sup>, of our method with respect to [30]. We obtain for our work the measures 0.998, <sup>0</sup>.996, <sup>0</sup>.978, <sup>0</sup>.964, <sup>0</sup>.97, <sup>0</sup>.9997, <sup>0</sup>.961, to be compared to <sup>0</sup>.575, <sup>0</sup>.525, <sup>0</sup>.527, <sup>0</sup>.543, <sup>0</sup>.477, <sup>0</sup>.366, <sup>0</sup>.523 for [30]. The results, computed with order 2 Taylor models, are obtained in 0.13 s with our method, and 505 s with [30]. Our implementation is thus both much faster and much more accurate. However, this comparison should only be taken as a rough indication, as it is unfair to [30] to compare their inner boxes to our projections on each component.

*Example 11.* Consider now the model, adapted from [11], of a platoon of n autonomous vehicles. Vehicle <sup>C</sup><sup>i</sup>+1 is just after <sup>C</sup><sup>i</sup>, for <sup>i</sup> = 1 to <sup>n</sup> <sup>−</sup> 1. Vehicle <sup>C</sup><sup>1</sup> is the leading vehicle. Sensors of <sup>C</sup><sup>i</sup>+1 measure its current speed <sup>v</sup><sup>i</sup>+1 as well as the speed <sup>v</sup><sup>i</sup> of the vehicle just in front of it. There respective positions are <sup>x</sup><sup>i</sup>+1 and <sup>x</sup><sup>i</sup>. We take a simple model where each vehicle <sup>C</sup><sup>i</sup>+1 accelerates so that to catch up with <sup>C</sup><sup>i</sup> if it measures that <sup>v</sup><sup>i</sup> > v<sup>i</sup>+1 and acts on its brakes if <sup>v</sup><sup>i</sup> < v<sup>i</sup>+1. Because of communication, accelerations are delayed by some time constant τ :

$$\begin{aligned} \dot{x}\_i(t) &= v\_i(t) & i &= 2, \cdots, n \\ \dot{v}\_{i+1}(t) &= \alpha (v\_i(t-\tau) - v\_{i+1}(t-\tau)) & i &= 2, \cdots, n-1 \end{aligned}$$

We add an equation defining the way the leading car drives. We suppose it adapts its speed between 1 and 3, following a polynomial curve. This needs to adapt the acceleration of vehicle C<sup>2</sup>:

$$\begin{aligned} \dot{x}\_1(t) &= 2 + (x\_1(t)/5 - 1)(x\_1(t)/5 - 2)(x\_1(t)/5 - 3)/6 \\ \dot{v}\_2(t) &= \alpha (2 + (x\_1(t)/5 - 1)(x\_1(t)/5 - 2)(x\_1(t)/5 - 3)/6 - v\_2(t - \tau)) \end{aligned}$$

We choose <sup>τ</sup> = 0.3 and <sup>α</sup> = 2.5. The initial position before time 0 of car <sup>C</sup><sup>i</sup> is slightly uncertain, taken to <sup>−</sup>(i <sup>−</sup> 1) + [−0.2, <sup>0</sup>.2], and its speed is in [1.99,2.01]. We represent in the right subplot of Fig. 2 the inner and outer approximations of the position of the vehicles in a 5 vehicles platoon (9-dimensional system) until time T = 10, with a time step of 0.1, and order 3 Taylor models, computed in 2.13 s. As the inner-approximations of different vehicles intersect, there are some unsafe initial conditions, such that the vehicules will collide. This example allows us to demonstrate the good scaling of our method: for 10 vehicles (19-dim system) and with the same parameters, results are obtained in 6.5 s.

# **6 Conclusion**

We have shown how to compute, efficiently and accurately, outer and inner flowpipes for DDEs with constant delay, using Taylor models combined with an efficient space abstraction. We have also introduced a notion of robust innerapproximation, that can be computed by the same method. We would like to extend this work for fully general DDEs, including variable delay, as well as study further the use of such computations for property verification on networked control systems. Indeed, while testing is a weaker alternative to inner-approximation for property falsification, we believe that robust inner-approximation provides new tools towards robust property verification or control synthesis.

**Acknowledgments.** The authors were supported by ANR project MALTHY, ANR-13-INSE-0003, DGA project "Complex Robotics Systems Safety" and the academic chair "Complex Systems Engineering" of Ecole Polytechnique-ENSTA-T´el´ecom-Thal`es-Dassault-DCNS-DGA-FX-FDO-Fondation ParisTech.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Author Index

Abate, Alessandro I-270 Akshay, S. I-251 Albarghouthi, Aws I-327 Albert, Elvira II-392 Anderson, Greg I-407 Argyros, George I-427 Arndt, Hannah II-3

Backes, John II-20 Bansal, Suguman I-367, II-99 Bardin, Sébastien II-294 Barrett, Clark II-236 Bartocci, Ezio I-449, I-547 Bauer, Matthew S. II-117 Becchi, Anna I-230 Berzish, Murphy II-45 Biere, Armin I-587 Bloem, Roderick I-547 Blondin, Michael I-604 Blotsky, Dmitry II-45 Bonichon, Richard II-294 Bønneland, Frederik M. I-527 Bouajjani, Ahmed II-336, II-372 Büning, Julian II-447

Češka, Milan I-612 Chadha, Rohit II-117 Chakraborty, Supratik I-251 Chatterjee, Krishnendu II-178 Chaudhuri, Swarat II-99 Chen, Taolue II-487 Cheval, Vincent II-28 Chudnov, Andrey II-430 Collins, Nathan II-413, II-430 Cook, Byron I-38, II-430, II-467 Cordeiro, Lucas I-183 Coti, Camille II-354 Cousot, Patrick II-75

D'Antoni, Loris I-386, I-427 David, Cristina I-270 Dillig, Isil I-407 Dodds, Joey II-430 Dohrau, Jérôme II-55

Dreossi, Tommaso I-3 Dureja, Rohit II-37

Eilers, Marco I-596, II-12 Emmi, Michael I-487 Enea, Constantin I-487, II-336, II-372 Esparza, Javier I-604

Fan, Chuchu I-347 Farinier, Benjamin II-294 Fedyukovich, Grigory I-124, I-164 Feng, Yijun I-507 Finkbeiner, Bernd I-144, I-289 Frehse, Goran I-468 Fremont, Daniel J. I-307

Gacek, Andrew II-20 Ganesh, Vijay II-45, II-275 Gao, Pengfei II-157 Gao, Sicun II-219 Ghassabani, Elaheh II-20 Giacobazzi, Roberto II-75 Giacobbe, Mirco I-468 Goel, Shubham I-251 Gómez-Zamalloa, Miguel II-392 Goubault, Eric II-523 Grishchenko, Ilya I-51 Gu, Ronghui II-317 Gupta, Aarti I-124, I-164, II-136

Hahn, Christopher I-144, I-289 Hassan, Mostafa II-12 He, Jinlong II-487 Henzinger, Monika II-178 Henzinger, Thomas A. I-449, I-468 Hsu, Justin I-327 Hu, Qinheping I-386 Huffman, Brian II-430

#### Isabel, Miguel II-392

Jaax, Stefan I-604 Jansen, Christina II-3 Jensen, Peter Gjøl I-527 Jha, Somesh I-3 Ji, Kailiang II-372

Kabir, Ifaz II-45 Katoen, Joost-Pieter I-507, I-643, II-3 Kelmendi, Edon I-623 Kesseli, Pascal I-183, I-270 Khazem, Kareem II-467 Kolokolova, Antonina II-275 Kong, Hui I-449 Kong, Soonho II-219 Kragl, Bernhard I-79 Krämer, Julia I-623 Kremer, Steve II-28 Křetínský, Jan I-567, I-623 Kroening, Daniel I-183, I-270, II-467 Kulal, Sumith I-251

Larsen, Kim Guldstrand I-527 Li, Haokun I-507 Li, Jianwen II-37 Li, Wenchao I-662 Loitzenbauer, Veronika II-178 Lukert, Philip I-289 Luttenberger, Michael I-578

MacCárthaigh, Colm II-430 Maffei, Matteo I-51 Magill, Stephen II-430 Malik, Sharad II-136 Matheja, Christoph II-3 Mathur, Umang I-347 Matyáš, Jiří I-612 McMillan, Kenneth L. I-191, I-407 Meggendorfer, Tobias I-567 Mertens, Eric II-430 Meyer, Philipp J. I-578 Mitra, Sayan I-347 Mora, Federico II-45 Mrazek, Vojtech I-612 Mullen, Eric II-430 Müller, Peter I-596, II-12, II-55 Münger, Severin II-55 Muñiz, Marco I-527 Mutluergil, Suha Orhun II-336

Namjoshi, Kedar S. I-367 Nguyen, Huyen T. T. II-354 Nickovic, Dejan I-547

Noll, Thomas II-3, II-447 Oraee, Simin II-178 Petrucci, Laure II-354 Pick, Lauren I-164 Pike, Lee II-413 Polgreen, Elizabeth I-270 Potet, Marie-Laure II-294 Prasad Sistla, A. II-117 Preiner, Mathias I-587, II-236 Pu, Geguang II-37 Püschel, Markus I-211 Putot, Sylvie II-523

Niemetz, Aina I-587, II-236

Qadeer, Shaz I-79, II-372 Quatmann, Tim I-643

Rabe, Markus N. II-256 Rakotonirina, Itsaka II-28 Ranzato, Francesco II-75 Rasmussen, Cameron II-256 Reynolds, Andrew II-236 Robere, Robert II-275 Rodríguez, César II-354 Roeck, Franz I-547 Rozier, Kristin Yvonne II-37 Rubio, Albert II-392

Sa'ar, Yaniv I-367 Sahlmann, Lorenz II-523 Satake, Yuki I-105 Schemmel, Daniel II-447 Schneidewind, Clara I-51 Schrammel, Peter I-183 Sekanina, Lukas I-612 Seshia, Sanjit A. I-3, I-307, II-256 Shah, Shetal I-251 Sickert, Salomon I-567, I-578 Singh, Gagandeep I-211 Solar-Lezama, Armando II-219 Song, Fu II-157, II-487 Soria Dustmann, Oscar II-447 Sousa, Marcelo II-354 Srba, Jiří I-527 Stenger, Marvin I-289 Subramanyan, Pramod II-136 Summers, Alexander J. II-55

Tang, Qiyi I-681 Tasiran, Serdar II-336, II-430, II-467 Tautschnig, Michael II-467 Tentrup, Leander I-289, II-256 Tinelli, Cesare II-236 Toman, Viktor II-178 Tomb, Aaron II-413, II-430 Torfah, Hazem I-144 Trtik, Marek I-183 Tullsen, Mark II-413 Tuttle, Mark R. II-467

Unno, Hiroshi I-105 Urban, Caterina II-12, II-55

van Breugel, Franck I-681 van Dijk, Tom II-198 Vardi, Moshe Y. II-37, II-99 Vasicek, Zdenek I-612 Vechev, Martin I-211 Viswanathan, Mahesh I-347, II-117 Vizel, Yakir II-136 Vojnar, Tomáš I-612

Wagner, Lucas II-20 Walther, Christoph II-505 Wang, Chao II-157 Wang, Guozhen II-487 Wang, Xinyu I-407 Wehrle, Klaus II-447 Weininger, Maximilian I-623 Westbrook, Eddy II-430 Whalen, Mike II-20 Wolf, Clifford I-587 Wu, Zhilin II-487 Xia, Bican I-507 Yahav, Eran I-27 Yan, Jun II-487 Yang, Junfeng II-317 Yang, Weikun II-136 Yuan, Xinhao II-317 Zaffanella, Enea I-230 Zhan, Naijun I-507 Zhang, Jun II-157 Zhang, Yueling I-124 Zheng, Yunhui II-45 Zhou, Weichao I-662 Ziegler, Christopher I-567